Skip to main content

Baby Cry Sound Detection: A Comparison of Hand Crafted Features and Deep Learning Approach

  • Conference paper
  • First Online:
Engineering Applications of Neural Networks (EANN 2017)

Abstract

Baby cry sound detection allows parents to be automatically alerted when their baby is crying. Current solutions in home environment ask for a client-server architecture where an end-node device streams the audio to a centralized server in charge of the detection. Even providing the best performances, these solutions raise power consumption and privacy issues. For these reasons, interest has recently grown in the community for methods which can run locally on battery-powered devices. This work presents a new set of features tailored to baby cry sound recognition, called hand crafted baby cry (HCBC) features. The proposed method is compared with a baseline using mel-frequency cepstrum coefficients (MFCCs) and a state-of-the-art convolutional neural network (CNN) system. HCBC features result to be on par with CNN, while requiring less computation effort and memory space at the cost of being application specific.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.audiomicro.com.

    https://www.freesound.org.

    https://www.pond5.com.

    https://www.soundsnap.com.

  2. 2.

    http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/.

  3. 3.

    http://lasagne.readthedocs.io/en/latest/.

  4. 4.

    http://deeplearning.net/software/theano/.

References

  1. Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th European Signal Processing Conference (EUSIPCO), pp. 1128–1132 (2016)

    Google Scholar 

  2. Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Sig. Process. Mag. 32(3), 16–34 (2015)

    Article  Google Scholar 

  3. Ntalampiras, S.: Audio pattern recognition of baby crying sound events. J. Audio Eng. Soc. 63(5), 358–369 (2015)

    Article  Google Scholar 

  4. Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W.: Automatic classification of infant cry: a review. In: International Conference on Biomedical Engineering (ICoBE), pp. 543–548, February 2012

    Google Scholar 

  5. Lavner, Y., Cohen, R., Ruinskiy, D., Ijzerman, H.: Baby cry detection in domestic environment using deep learning. In: IEEE International Conference on the Science of Electrical Engineering (ICSEE), pp. 1–5, November 2016

    Google Scholar 

  6. Saha, B., Purkait, P.K., Mukherjee, J., Majumdar, A.K., Majumdar, B., Singh, A.K.: An embedded system for automatic classification of neonatal cry. In: IEEE Point-of-Care Healthcare Technologies (PHT), pp. 248–251, January 2013

    Google Scholar 

  7. Bğnicğ, I.A., Cucu, H., Buzo, A., Burileanu, D., Burileanu, C.: Baby cry recognition in real-world conditions. In: 39th International Conference on Telecommunications and Signal Processing (TSP), pp. 315–318, June 2016

    Google Scholar 

  8. Battaglino, D., Lepauloux, L., Evans, N.: The open-set problem in acoustic scene classification. In: IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5, September 2016

    Google Scholar 

  9. Rabaoui, A., Davy, M., Rossignol, S., Lachiri, Z., Ellouze, N.: Improved one-class svm classifier for sounds classification. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 117–122 (2007)

    Google Scholar 

  10. Tax, D.M.J., Duin, R.P.W.: Data domain description using support vectors. In: European Symposium on Artificial Neural Networks, pp. 251–256 (1999)

    Google Scholar 

  11. Cohen, R., Lavner, Y.: Infant cry analysis and detection. In: IEEE 27th Convention of Electrical and Electronics Engineers, pp. 1–5, November 2012

    Google Scholar 

  12. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Sig. Process. 7(3–4), 197–387 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, September 2015

    Google Scholar 

  14. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)

  15. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd International Conference on Machine Learning, ICML, pp. 448–456 (2015)

    Google Scholar 

  16. Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: IFA Proceedings 17, pp. 97–110 (1993)

    Google Scholar 

  17. Foster, P., Sigtia, S., Krstulovic, S., Barker, J., Plumbley, M.D.: Chime-home: a dataset for sound source recognition in a domestic environment. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5 (2015)

    Google Scholar 

  18. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3), e0118432 (2015)

    Article  Google Scholar 

  19. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

  20. Wang, J.C., Wang, J.F., Weng, Y.S.: Chip design of MFCC extraction for speech recognition. Integr. VLSI J. 32(1–3), 111–131 (2002)

    Article  MATH  Google Scholar 

  21. Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4820–4828, June 2016

    Google Scholar 

  22. Sigtia, S., Stark, A.M., Krstulovi, S., Plumbley, M.D.: Automatic environmental sound recognition: performance versus computational cost. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2096–2107 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Battaglino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Torres, R., Battaglino, D., Lepauloux, L. (2017). Baby Cry Sound Detection: A Comparison of Hand Crafted Features and Deep Learning Approach. In: Boracchi, G., Iliadis, L., Jayne, C., Likas, A. (eds) Engineering Applications of Neural Networks. EANN 2017. Communications in Computer and Information Science, vol 744. Springer, Cham. https://doi.org/10.1007/978-3-319-65172-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65172-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65171-2

  • Online ISBN: 978-3-319-65172-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics