Baby Cry Sound Detection: A Comparison of Hand Crafted Features and Deep Learning Approach

Torres, Rafael; Battaglino, Daniele; Lepauloux, Ludovick

doi:10.1007/978-3-319-65172-9_15

Rafael Torres¹³,
Daniele Battaglino^13,14 &
Ludovick Lepauloux¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 744))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

3272 Accesses
21 Citations
2 Altmetric

Abstract

Baby cry sound detection allows parents to be automatically alerted when their baby is crying. Current solutions in home environment ask for a client-server architecture where an end-node device streams the audio to a centralized server in charge of the detection. Even providing the best performances, these solutions raise power consumption and privacy issues. For these reasons, interest has recently grown in the community for methods which can run locally on battery-powered devices. This work presents a new set of features tailored to baby cry sound recognition, called hand crafted baby cry (HCBC) features. The proposed method is compared with a baseline using mel-frequency cepstrum coefficients (MFCCs) and a state-of-the-art convolutional neural network (CNN) system. HCBC features result to be on par with CNN, while requiring less computation effort and memory space at the cost of being application specific.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th European Signal Processing Conference (EUSIPCO), pp. 1128–1132 (2016)
Google Scholar
Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Sig. Process. Mag. 32(3), 16–34 (2015)
Article Google Scholar
Ntalampiras, S.: Audio pattern recognition of baby crying sound events. J. Audio Eng. Soc. 63(5), 358–369 (2015)
Article Google Scholar
Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W.: Automatic classification of infant cry: a review. In: International Conference on Biomedical Engineering (ICoBE), pp. 543–548, February 2012
Google Scholar
Lavner, Y., Cohen, R., Ruinskiy, D., Ijzerman, H.: Baby cry detection in domestic environment using deep learning. In: IEEE International Conference on the Science of Electrical Engineering (ICSEE), pp. 1–5, November 2016
Google Scholar
Saha, B., Purkait, P.K., Mukherjee, J., Majumdar, A.K., Majumdar, B., Singh, A.K.: An embedded system for automatic classification of neonatal cry. In: IEEE Point-of-Care Healthcare Technologies (PHT), pp. 248–251, January 2013
Google Scholar
Bğnicğ, I.A., Cucu, H., Buzo, A., Burileanu, D., Burileanu, C.: Baby cry recognition in real-world conditions. In: 39th International Conference on Telecommunications and Signal Processing (TSP), pp. 315–318, June 2016
Google Scholar
Battaglino, D., Lepauloux, L., Evans, N.: The open-set problem in acoustic scene classification. In: IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5, September 2016
Google Scholar
Rabaoui, A., Davy, M., Rossignol, S., Lachiri, Z., Ellouze, N.: Improved one-class svm classifier for sounds classification. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 117–122 (2007)
Google Scholar
Tax, D.M.J., Duin, R.P.W.: Data domain description using support vectors. In: European Symposium on Artificial Neural Networks, pp. 251–256 (1999)
Google Scholar
Cohen, R., Lavner, Y.: Infant cry analysis and detection. In: IEEE 27th Convention of Electrical and Electronics Engineers, pp. 1–5, November 2012
Google Scholar
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Sig. Process. 7(3–4), 197–387 (2014)
Article MathSciNet MATH Google Scholar
Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, September 2015
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning, ICML, pp. 448–456 (2015)
Google Scholar
Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: IFA Proceedings 17, pp. 97–110 (1993)
Google Scholar
Foster, P., Sigtia, S., Krstulovic, S., Barker, J., Plumbley, M.D.: Chime-home: a dataset for sound source recognition in a domestic environment. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5 (2015)
Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3), e0118432 (2015)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Article Google Scholar
Wang, J.C., Wang, J.F., Weng, Y.S.: Chip design of MFCC extraction for speech recognition. Integr. VLSI J. 32(1–3), 111–131 (2002)
Article MATH Google Scholar
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820–4828, June 2016
Google Scholar
Sigtia, S., Stark, A.M., Krstulovi, S., Plumbley, M.D.: Automatic environmental sound recognition: performance versus computational cost. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2096–2107 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

NXP Semiconductors, Mougins, France
Rafael Torres, Daniele Battaglino & Ludovick Lepauloux
EURECOM, Biot, France
Daniele Battaglino

Authors

Rafael Torres
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Battaglino
View author publications
You can also search for this author in PubMed Google Scholar
Ludovick Lepauloux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele Battaglino .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Giacomo Boracchi
Democritus University of Thrace, University Campus, Xanthi, Greece
Lazaros Iliadis
School of Computing Science and Digital Media, Robert Gordon University, Aberdeen, United Kingdom
Chrisina Jayne
Univesity of Ioannina, Ioannina, Greece
Aristidis Likas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Torres, R., Battaglino, D., Lepauloux, L. (2017). Baby Cry Sound Detection: A Comparison of Hand Crafted Features and Deep Learning Approach. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-65172-9_15
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65171-2
Online ISBN: 978-3-319-65172-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics