An Automatic Sound Classification Framework with Non-volatile Memory

Wu, Jibin; Chua, Yansong; Zhang, Malu; Li, Haizhou; Tan, Kay Chen

doi:10.1007/978-981-15-6912-8_13

Jibin Wu⁴,
Yansong Chua⁵,
Malu Zhang⁴,
Haizhou Li⁴ &
…
Kay Chen Tan⁶

1228 Accesses

Abstract

Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, a biologically plausible ASC framework is introduced, namely SOM-SNN. The emerging dense crossbar array of non-volatile memory (NVM) devices have been recognized as a promising approach to emulate such distributed, massively-parallel and densely connected neuromorphic computing systems. This chapter presents the general structure of this framework for sound event and speech recognition, demonstrating attractive computational benefits and suitableness with an NVM implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

G. Guo, S.Z. Li, Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Networks 14(1), 209–215 (2003). Jan
Google Scholar
A. Rabaoui, M. Davy, S. Rossignol, N. Ellouze, Using one-class SVMS and wavelets for audio surveillance. IEEE Trans. Inf. Forensics Secur. 3(4), 763–775 (2008). Dec
Google Scholar
J. Dennis, H.D. Tran, H. Li, Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011). Feb
Google Scholar
C. Kwak, O.W. Kwon, Cardiac disorder classification by heart sound signals using murmur likelihood and hidden Markov model state likelihood. IET Signal Proc. 6(4), 326–334 (2012). June
Google Scholar
R.V. Sharan, T.J. Moir, An overview of applications and advancements in automatic sound recognition. Neurocomputing 200, 22–34 (2016)
Article Google Scholar
P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, D.S. Modha, A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)
Article ADS Google Scholar
F.F. Li, R. Fergus, P. Perona, One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006). April
Google Scholar
M. Palatucci, D. Pomerleau, G.E. Hinton, T.M. Mitchell, Zero-shot learning with semantic output codes. Adv. Neural Inf. Process. Syst. pp. 1410–1418 (2009)
Google Scholar
C.D. Schuman, T.E. Potok, R.M. Patton, J.D. Birdwell, M.E. Dean, G.S. Rose, J.S. Plank, A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017)
G.W. Burr, R.M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M.i Ishii, P. Narayanan, A. Fumarola, et al., Neuromorphic computing using non-volatile memory. Adv. Phys.: X 2(1), 89–124 (2017)
Google Scholar
G. Indiveri, S.-C. Liu, Memory and information processing in neuromorphic systems. Proc. IEEE 103(8), 1379–1397 (2015)
Article Google Scholar
M. Ziegler, Ch. Wenger, E. Chicca, H. Kohlstedt, Tutorial: Concepts for closely mimicking biological learning with memristive devices: Principles to emulate cellular forms of learning. J. Appl. Phys. 124(15), 152003 (2018)
Article ADS Google Scholar
T. Delbrück B. Linares-Barranco, E. Culurciello, C. Posch, Activity-driven, event-based vision sensors, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems, May 2010, pp. 2426–2429
Google Scholar
C. Brandli, R. Berner, M. Yang, S.C. Liu, T. Delbruck, A 240\(\times \) 180 130 dB 3 \(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014). Oct
Google Scholar
S.C. Liu, A. van Schaik, B.A. Minch, T. Delbruck, Asynchronous binaural spatial audition sensor with 2644 channel output. IEEE Trans. Biomed. Circuits Syst. 8(4), 453–464 (2014). Aug
Google Scholar
A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, D. Modha, A low power, fully event-based gesture recognition system, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 7388–7397
Google Scholar
T. Serrano-Gotarredona, B. Linares-Barranco, F. Galluppi, L. Plana, S. Furber, Convnets experiments on spinnaker, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), May 2015, pp. 2405–2408
Google Scholar
T. Kohonen, The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)
Article Google Scholar
J. Wu, Y. Chua, M. Zhang, H. Li, K.C. Tan, A spiking neural network framework for robust sound classification. Front. Neurosci. 12, 836 (2018)
Article Google Scholar
M. Bear, B. Connors, M. Paradiso, Neuroscience: Exploring the brain, 4th edn. (Wolters Kluwer, Philadelphia, 2016)
Google Scholar
A. R. Møller, Hearing: Anatomy, Physiology, and Disorders of the Auditory System, (Plural Publishing, 2012)
Google Scholar
Q. Yu, H. Tang, K.C. Tan, H. Li, Rapid feedforward computation by temporal encoding and learning with spiking neurons. IEEE Trans. Neural Networks Learn. Syst. 24(10), 1539–1552 (2013)
Article Google Scholar
S.M. Bohte, J.N. Kok, H. La Poutre, Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48(1–4), 17–37 (2002)
Article Google Scholar
C. Pantev, O. Bertrand, C. Eulitz, C. Verkindt, S. Hampson, G. Schuierer, T. Elbert, Specific tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous magnetic and electric recordings. Electroencephalogr. Clin. Neurophysiol. 94(1), 26–40 (1995)
Article Google Scholar
R. Gütig, H. Sompolinsky, The tempotron: a neuron that learns spike timing-based decisions. Nat. Neurosci. 9(3), 420 (2006)
Article Google Scholar
J. Dennis, Q. Yu, H. Tang, H.D. Tran, H. Li, Temporal coding of local spectrogram features for robust sound recognition, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 803–807
Google Scholar
R. Xiao, R. Yan, H. Tang, K.C. Tan, A Spiking Neural Network Model for Sound Recognition (Springer, Singapore, 2017), pp. 584–594
Google Scholar
J. Wu, Y. Chua, H. Li, A biologically plausible speech recognition framework based on spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8
Google Scholar
C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
I. McLoughlin, H. Zhang, Z. Xie, Y. Song, W. Xiao, Robust sound event classification using deep neural networks. IEEE/ACM Trans. Audio, Speech, Language Process. 23(3), 540–552 (2015). March
Google Scholar
T. Nishiura, S. Nakamura, An evaluation of sound source identification with RWCP sound scene database in real acoustic environments, in Proceedings. IEEE International Conference on Multimedia and Expo, 2002, vol. 2, pp. 265–268
Google Scholar
R.G. Leonard, G. Doddington, Tidigits Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
N. Morgan, H. Bourlard, Continuous speech recognition using multilayer perceptrons with hidden markov models, in 1990 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 1990), pp. 413–416
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. pp. 1097–1105 (2012)
Google Scholar
A. Graves, A. Mohamed, G.E. Hinton, Speech recognition with deep recurrent neural networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 2013), pp. 6645–6649
Book Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
A.M. Saxe, J.L. McClelland, S. Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013)
D. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Google Scholar
R. Gütig, H. Sompolinsky, Time-warp-invariant neuronal processing. PLoS Biol. 7(7), e1000141 (2009)
Article Google Scholar
K. Greff, R.K. Srivastava, J. Koutnk, B.R. Steunebrink, J. Schmidhuber, LSTM: A search space odyssey. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2222–2232 (2017)
Article MathSciNet Google Scholar
J. Anumula, D. Neil, T. Delbruck, S.C. Liu, Feature representations for neuromorphic audio spike streams. Front. Neurosci 12, 23 (2018)
Article Google Scholar
M. Abdollahi, S.-C. Liu, Speaker-independent isolated digit recognition using an AER silicon cochlea, in 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS), Nov 2011, pp. 269–272
Google Scholar
D. Neil, S.-C. Liu, Effective sensor fusion with event-based sensors and deep network architectures, in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 2282–2285
Google Scholar
A. Tavanaei, A. Maida, Bio-inspired multi-layer spiking neural network extracts discriminative features from speech signals, in International Conference on Neural Information Processing (Springer, Berlin, 2017), pp. 899–908
Google Scholar
A. Tavanaei, A. Maida, A spiking network that learns to extract spike signatures from speech signals. Neurocomputing 240, 191–199 (2017)
Article Google Scholar
Y. Zhang, P. Li, Y. Jin, Y. Choe, A digital liquid state machine with biologically inspired learning and its application to speech recognition. IEEE Trans. Neural Networks Learn. Syst. 26(11), 2635–2649 (2015)
Article MathSciNet Google Scholar
M. Zhang, H. Qu, A. Belatreche, X. Xie, EMPD: An efficient membrane potential driven supervised learning algorithm for spiking neurons. IEEE Trans. Cogn. Dev. Syst. 99, 1–1 (2017)
Google Scholar
R. Gütig, Spiking neurons can discover predictive features by aggregate-label learning. Science 351(6277), 4113 (2016)
Article Google Scholar
F. Ponulak, A. Kasiński, Supervised learning in spiking neural networks with Re Su Me: sequence learning, classification, and spike shifting. Neural Comput. 22(2), 467–510 (2010)
Article MathSciNet Google Scholar
Q. Yu, H. Tang, K.C. Tan, H. Li, Precise-spike-driven synaptic plasticity: Learning hetero-association of spatiotemporal spike patterns. PLoS ONE 8(11), 1–16 (2013). Nov
Google Scholar
D. Bilecen, K. Scheffler, N. Schmid, K. Tschopp, J. Seelig, Tonotopic organization of the human auditory cortex as detected by bold-FMRI. Hear. Res. 126(1), 19–27 (1998)
Article Google Scholar
T. Hromádka, M.R. DeWeese, A.M. Zador, Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol. 6(1), e16 (2008)
Article Google Scholar
S. Haykin, Z. Chen, The cocktail party problem. Neural Comput. 17(9), 1875–1902 (2005)
Article Google Scholar
M.C.W. Van Rossum, B.J. O’Brien, R.G. Smith, Effects of noise on the spike timing precision of retinal ganglion cells. J. Neurophysiol. 89(5), 2406–2419 (2003)
Article Google Scholar
E. Schneidman, Noise and information in neural codes, Ph.D. thesis, Hebrew University, (2001)
Google Scholar
M.A. Zidan, A. Chen, G. Indiveri, W.D. Lu, Memristive computing devices and applications. J. Electroceram. 39(1–4), 4–20 (2017)
Article Google Scholar
Z. Pan, Y. Chua, J. Wu, H. Li, An event-based cochlear filter temporal encoding scheme for speech signals, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–8
Google Scholar
G.W. Burr, P. Narayanan, R.M. Shelby, S. Sidler, I. Boybat, C. di Nolfo, Y. Leblebici, Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power), in 2015 IEEE International Electron Devices Meeting (IEDM), Dec 2015, p. 4.4
Google Scholar
T. Rumbell, S.L. Denham, T. Wennekers, A spiking self-organizing map combining STDP, oscillations, and continuous learning. IEEE Trans. Neural Networks Learn. Syst. 25(5), 894–907 (2014)
Article Google Scholar
H. Hazan, D. Saunders, D.T. Sanghavi, H. Siegelmann, R. Kozma, Unsupervised learning with self-organizing spiking neural networks, in 2018 International Joint Conference on Neural Networks (IJCNN), July 2018
Google Scholar
T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian, E. Eleftheriou, Stochastic phase-change neurons. Nat. Nanotechnol. 11(8), 693 (2016)
Article ADS Google Scholar
W. Gerstner, W.M. Kistler, R. Naud, L. Paninski, Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition (Cambridge University Press, Cambridge, 2014)
Book Google Scholar
T. Tuma, M. Le Gallo, A. Sebastian, E. Eleftheriou, Detecting correlations using phase-change neurons and synapses. IEEE Electron Device Lett. 37(9), 1238–1241 (2016)
Article ADS Google Scholar
A. Pantazi, S. Woźniak, T. Tuma, E. Eleftheriou, All-memristive neuromorphic computing with level-tuned neurons. Nanotechnology 27(35), 355205 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
Jibin Wu, Malu Zhang & Haizhou Li
Institute for Infocomm Research, A*STAR, Singapore, Singapore
Yansong Chua
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Kay Chen Tan

Authors

Jibin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Chua
View author publications
You can also search for this author in PubMed Google Scholar
Malu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Kay Chen Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jibin Wu .

Editor information

Editors and Affiliations

School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
Wen Siang Lew
School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
Gerard Joseph Lim
School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
Putu Andhita Dananjaya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, J., Chua, Y., Zhang, M., Li, H., Tan, K.C. (2021). An Automatic Sound Classification Framework with Non-volatile Memory. In: Lew, W.S., Lim, G.J., Dananjaya, P.A. (eds) Emerging Non-volatile Memory Technologies. Springer, Singapore. https://doi.org/10.1007/978-981-15-6912-8_13

Download citation

DOI: https://doi.org/10.1007/978-981-15-6912-8_13
Published: 10 January 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6910-4
Online ISBN: 978-981-15-6912-8
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics