Abstract
Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, a biologically plausible ASC framework is introduced, namely SOM-SNN. The emerging dense crossbar array of non-volatile memory (NVM) devices have been recognized as a promising approach to emulate such distributed, massively-parallel and densely connected neuromorphic computing systems. This chapter presents the general structure of this framework for sound event and speech recognition, demonstrating attractive computational benefits and suitableness with an NVM implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
G. Guo, S.Z. Li, Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Networks 14(1), 209–215 (2003). Jan
A. Rabaoui, M. Davy, S. Rossignol, N. Ellouze, Using one-class SVMS and wavelets for audio surveillance. IEEE Trans. Inf. Forensics Secur. 3(4), 763–775 (2008). Dec
J. Dennis, H.D. Tran, H. Li, Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011). Feb
C. Kwak, O.W. Kwon, Cardiac disorder classification by heart sound signals using murmur likelihood and hidden Markov model state likelihood. IET Signal Proc. 6(4), 326–334 (2012). June
R.V. Sharan, T.J. Moir, An overview of applications and advancements in automatic sound recognition. Neurocomputing 200, 22–34 (2016)
P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, D.S. Modha, A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)
F.F. Li, R. Fergus, P. Perona, One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006). April
M. Palatucci, D. Pomerleau, G.E. Hinton, T.M. Mitchell, Zero-shot learning with semantic output codes. Adv. Neural Inf. Process. Syst. pp. 1410–1418 (2009)
C.D. Schuman, T.E. Potok, R.M. Patton, J.D. Birdwell, M.E. Dean, G.S. Rose, J.S. Plank, A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017)
G.W. Burr, R.M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M.i Ishii, P. Narayanan, A. Fumarola, et al., Neuromorphic computing using non-volatile memory. Adv. Phys.: X 2(1), 89–124 (2017)
G. Indiveri, S.-C. Liu, Memory and information processing in neuromorphic systems. Proc. IEEE 103(8), 1379–1397 (2015)
M. Ziegler, Ch. Wenger, E. Chicca, H. Kohlstedt, Tutorial: Concepts for closely mimicking biological learning with memristive devices: Principles to emulate cellular forms of learning. J. Appl. Phys. 124(15), 152003 (2018)
T. Delbrück B. Linares-Barranco, E. Culurciello, C. Posch, Activity-driven, event-based vision sensors, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 2426–2429
C. Brandli, R. Berner, M. Yang, S.C. Liu, T. Delbruck, A 240\(\times \) 180 130 dB 3 \(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014). Oct
S.C. Liu, A. van Schaik, B.A. Minch, T. Delbruck, Asynchronous binaural spatial audition sensor with 2644 channel output. IEEE Trans. Biomed. Circuits Syst. 8(4), 453–464 (2014). Aug
A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, D. Modha, A low power, fully event-based gesture recognition system, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 7388–7397
T. Serrano-Gotarredona, B. Linares-Barranco, F. Galluppi, L. Plana, S. Furber, Convnets experiments on spinnaker, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pp. 2405–2408
T. Kohonen, The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)
J. Wu, Y. Chua, M. Zhang, H. Li, K.C. Tan, A spiking neural network framework for robust sound classification. Front. Neurosci. 12, 836 (2018)
M. Bear, B. Connors, M. Paradiso, Neuroscience: Exploring the brain, 4th edn. (Wolters Kluwer, Philadelphia, 2016)
A. R. Møller, Hearing: Anatomy, Physiology, and Disorders of the Auditory System, (Plural Publishing, 2012)
Q. Yu, H. Tang, K.C. Tan, H. Li, Rapid feedforward computation by temporal encoding and learning with spiking neurons. IEEE Trans. Neural Networks Learn. Syst. 24(10), 1539–1552 (2013)
S.M. Bohte, J.N. Kok, H. La Poutre, Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4), 17–37 (2002)
C. Pantev, O. Bertrand, C. Eulitz, C. Verkindt, S. Hampson, G. Schuierer, T. Elbert, Specific tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous magnetic and electric recordings. Electroencephalogr. Clin. Neurophysiol. 94(1), 26–40 (1995)
R. Gütig, H. Sompolinsky, The tempotron: a neuron that learns spike timing-based decisions. Nat. Neurosci. 9(3), 420 (2006)
J. Dennis, Q. Yu, H. Tang, H.D. Tran, H. Li, Temporal coding of local spectrogram features for robust sound recognition, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 803–807
R. Xiao, R. Yan, H. Tang, K.C. Tan, A Spiking Neural Network Model for Sound Recognition (Springer, Singapore, 2017), pp. 584–594
J. Wu, Y. Chua, H. Li, A biologically plausible speech recognition framework based on spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8
C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
I. McLoughlin, H. Zhang, Z. Xie, Y. Song, W. Xiao, Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio, Speech, Language Process. 23(3), 540–552 (2015). March
T. Nishiura, S. Nakamura, An evaluation of sound source identification with RWCP sound scene database in real acoustic environments, in Proceedings. IEEE International Conference on Multimedia and Expo, 2002, vol. 2, pp. 265–268
R.G. Leonard, G. Doddington, Tidigits Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)
N. Morgan, H. Bourlard, Continuous speech recognition using multilayer perceptrons with hidden markov models, in 1990 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 1990), pp. 413–416
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. pp. 1097–1105 (2012)
A. Graves, A. Mohamed, G.E. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 2013), pp. 6645–6649
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
A.M. Saxe, J.L. McClelland, S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013)
D. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
R. Gütig, H. Sompolinsky, Time-warp-invariant neuronal processing. PLoS Biol. 7(7), e1000141 (2009)
K. Greff, R.K. Srivastava, J. Koutnk, B.R. Steunebrink, J. Schmidhuber, LSTM: A search space odyssey. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2222–2232 (2017)
J. Anumula, D. Neil, T. Delbruck, S.C. Liu, Feature representations for neuromorphic audio spike streams. Front. Neurosci 12, 23 (2018)
M. Abdollahi, S.-C. Liu, Speaker-independent isolated digit recognition using an AER silicon cochlea, in 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS), Nov 2011, pp. 269–272
D. Neil, S.-C. Liu, Effective sensor fusion with event-based sensors and deep network architectures, in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 2282–2285
A. Tavanaei, A. Maida, Bio-inspired multi-layer spiking neural network extracts discriminative features from speech signals, in International Conference on Neural Information Processing (Springer, Berlin, 2017), pp. 899–908
A. Tavanaei, A. Maida, A spiking network that learns to extract spike signatures from speech signals. Neurocomputing 240, 191–199 (2017)
Y. Zhang, P. Li, Y. Jin, Y. Choe, A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans. Neural Networks Learn. Syst. 26(11), 2635–2649 (2015)
M. Zhang, H. Qu, A. Belatreche, X. Xie, EMPD: An efficient membrane potential driven supervised learning algorithm for spiking neurons. IEEE Trans. Cogn. Dev. Syst. 99, 1–1 (2017)
R. Gütig, Spiking neurons can discover predictive features by aggregate-label learning. Science 351(6277), 4113 (2016)
F. Ponulak, A. Kasiński, Supervised learning in spiking neural networks with Re Su Me: sequence learning, classification, and spike shifting. Neural Comput. 22(2), 467–510 (2010)
Q. Yu, H. Tang, K.C. Tan, H. Li, Precise-spike-driven synaptic plasticity: Learning hetero-association of spatiotemporal spike patterns. PLoS ONE 8(11), 1–16 (2013). Nov
D. Bilecen, K. Scheffler, N. Schmid, K. Tschopp, J. Seelig, Tonotopic organization of the human auditory cortex as detected by bold-FMRI. Hear. Res. 126(1), 19–27 (1998)
T. Hromádka, M.R. DeWeese, A.M. Zador, Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol. 6(1), e16 (2008)
S. Haykin, Z. Chen, The cocktail party problem. Neural Comput. 17(9), 1875–1902 (2005)
M.C.W. Van Rossum, B.J. O’Brien, R.G. Smith, Effects of noise on the spike timing precision of retinal ganglion cells. J. Neurophysiol. 89(5), 2406–2419 (2003)
E. Schneidman, Noise and information in neural codes, Ph.D. thesis, Hebrew University, (2001)
M.A. Zidan, A. Chen, G. Indiveri, W.D. Lu, Memristive computing devices and applications. J. Electroceram. 39(1–4), 4–20 (2017)
Z. Pan, Y. Chua, J. Wu, H. Li, An event-based cochlear filter temporal encoding scheme for speech signals, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8
G.W. Burr, P. Narayanan, R.M. Shelby, S. Sidler, I. Boybat, C. di Nolfo, Y. Leblebici, Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power), in 2015 IEEE International Electron Devices Meeting (IEDM), Dec 2015, p. 4.4
T. Rumbell, S.L. Denham, T. Wennekers, A spiking self-organizing map combining STDP, oscillations, and continuous learning. IEEE Trans. Neural Networks Learn. Syst. 25(5), 894–907 (2014)
H. Hazan, D. Saunders, D.T. Sanghavi, H. Siegelmann, R. Kozma, Unsupervised learning with self-organizing spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018
T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian, E. Eleftheriou, Stochastic phase-change neurons. Nat. Nanotechnol. 11(8), 693 (2016)
W. Gerstner, W.M. Kistler, R. Naud, L. Paninski, Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition (Cambridge University Press, Cambridge, 2014)
T. Tuma, M. Le Gallo, A. Sebastian, E. Eleftheriou, Detecting correlations using phase-change neurons and synapses. IEEE Electron Device Lett. 37(9), 1238–1241 (2016)
A. Pantazi, S. Woźniak, T. Tuma, E. Eleftheriou, All-memristive neuromorphic computing with level-tuned neurons. Nanotechnology 27(35), 355205 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wu, J., Chua, Y., Zhang, M., Li, H., Tan, K.C. (2021). An Automatic Sound Classification Framework with Non-volatile Memory. In: Lew, W.S., Lim, G.J., Dananjaya, P.A. (eds) Emerging Non-volatile Memory Technologies. Springer, Singapore. https://doi.org/10.1007/978-981-15-6912-8_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-6912-8_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6910-4
Online ISBN: 978-981-15-6912-8
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)