Baby Cry Detection: Deep Learning and Classical Approaches

  • Rami CohenEmail author
  • Dima Ruinskiy
  • Janis Zickfeld
  • Hans IJzerman
  • Yizhar Lavner
Part of the Studies in Computational Intelligence book series (SCI, volume 867)


In this chapter, we compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We design and evaluate several convolutional neural network (CNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines. In addition to feed-forward CNNs, we analyze the performance of recurrent neural network (RNN) architectures, which are able to capture temporal behavior of acoustic events. We show that by carefully designing CNN architectures with specialized non-symmetric kernels, better results are obtained compared to common CNN architectures.


Baby cry detection Deep learning Convolutional neural networks Audio detection 


  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012).
  2. 2.
    Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., Seybold, B.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (icassp), pp. 131–135. IEEE (2017)Google Scholar
  3. 3.
    Cakir, E., Heittola, T., Huttunen, H., Virtanen, T.: Polyphonic sound event detection using multi label deep neural networks. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)Google Scholar
  4. 4.
    Ramírez, J., Górriz, J.M., Segura, J.C.: Voice Activity Detection. Fundamentals And Speech Recognition System Robustness (2007)Google Scholar
  5. 5.
    Ruinskiy, D., Lavner, Y.: An effective algorithm for automatic detection and exact demarcation of breath sounds in speech and song signals. IEEE Trans. Audio Speech Lang. Process. 15, 838–850 (2007)CrossRefGoogle Scholar
  6. 6.
    Kong, Y.-Y., Mullangi, A., Kokkinakis, K.: Classification of fricative consonants for speech enhancement in hearing devices. PloS one (2014)Google Scholar
  7. 7.
    Frid, A., Lavner, Y.: Spectral and textural features for automatic classification of fricatives. In: XXII Annual Pacific Voice Conference (PVC), pp. 1–4 (2014)Google Scholar
  8. 8.
    Panagiotakis, C., Tziritas, G.: A speech/music discriminator based on rms and zero-crossings. IEEE Trans. Multimed. 7, 155–166 (2005)CrossRefGoogle Scholar
  9. 9.
    Lavner, Y., Ruinskiy, D.: A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP J. Audio Speech Music Process. (2009)Google Scholar
  10. 10.
    Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRefGoogle Scholar
  11. 11.
    Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)CrossRefGoogle Scholar
  12. 12.
    Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Process. Mag. 12(3), 24–42 (1995)CrossRefGoogle Scholar
  13. 13.
    Aruna, C., Parameswari, A.D., Malini, M., Gopu, G.: Voice recognition and touch screen control based wheel chair for paraplegic persons. In: 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), pp. 1–5 (2014)Google Scholar
  14. 14.
    Carletti, V., Foggia, P., Percannella, G., Saggese, A., Strisciuglio, N., Vento, M.: Audio surveillance using a bag of aural words classifier. In: 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 81–86 (2013)Google Scholar
  15. 15.
    Ye, J., Kobayashi, T., Higuchi, T.: Audio-based indoor health monitoring system using flac features. In: 2010 International Conference on Emerging Security Technologies, pp. 90–95 (2010)Google Scholar
  16. 16.
    Kawano, D., Ogawa, T., Matsumoto, H.: A proposal of the method to suppress a click noise only from an observed audio signal. In: 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 93–96 (2017)Google Scholar
  17. 17.
    Zhang, H., McLoughlin, I., Song, Y.: Robust sound event recognition using convolutional neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 559–563 (2015)Google Scholar
  18. 18.
    Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)Google Scholar
  19. 19.
    Valenti, M., Diment, A., Parascandolo, G., Squartini, S., Virtanen, T.: DCASE 2016 acoustic scene classification using convolutional neural networks. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016). Tampere University of Technology. Department of Signal Processing (2016)Google Scholar
  20. 20.
    Çakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)CrossRefGoogle Scholar
  21. 21.
    Naithani, G., Barker, T., Parascandolo, G., Bramslw, L., Pontoppidan, N.H., Virtanen, T.: Low latency sound source separation using convolutional recurrent neural networks. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 71–75 (2017)Google Scholar
  22. 22.
    Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968 (2014)Google Scholar
  23. 23.
    Pons, J., Nieto, O., Prockup, M., Schmidt, E.M., Ehmann, A.F., Serra, X.: End-to-end learning for music audio tagging at scale. In: ISMIR (2018)Google Scholar
  24. 24.
    Ferretti, D., Severini, M., Principi, E., Cenci, A., Squartini,S.: Infant cry detection in adverse acoustic environments by using deep neural networks. In: EUSIPCO (2018)Google Scholar
  25. 25.
    Turan, M.A.T., Erzin, E.: Monitoring infant’s emotional cry in domestic environments using the capsule network architecture. In: Interspeech (2018)Google Scholar
  26. 26.
    Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS (2017)Google Scholar
  27. 27.
    Torres, R., Battaglino, D., Lepauloux, L.: Baby cry sound detection: a comparison of hand crafted features and deep learning approach. In: EANN (2017)Google Scholar
  28. 28.
    Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W.: Automatic classification of infant cry: a review. In: 2012 International Conference on Biomedical Engineering (ICoBE), pp. 543–548 (2012)Google Scholar
  29. 29.
    Lavner, Y., Cohen, R., Ruinskiy, D., IJzerman, H.:Baby cry detection in domestic environment using deep learning. In: 2016 International Conference on the Sceience of Electrical Engineering (ICSEE 2016) (2016)Google Scholar
  30. 30.
    Zhang, X., Zou, Y., Liu, Y.: AICDS: An Infant Crying Detection System Based on Lightweight Convolutional Neural Network, pp. 185–196. Springer (2018)Google Scholar
  31. 31.
    Xu, Y., Hasegawa-Johnson, M., McElwain, N.: Infant emotional outbursts detection in infant-parent spoken interactions. In: Interspeech (2018)Google Scholar
  32. 32.
    Silva, G., Wickramasinghe, D.: Infant cry detection system with automatic soothing and video monitoring functions. J. Eng. Technol. Open Univ. Sri Lanka (JET-OUSL) 5(1). (2017)
  33. 33.
    Gao, J., Pabon, L.: Hot car baby detector. Illinois College of Engineering, Technical Report, December 2014Google Scholar
  34. 34.
    Lollipop smart baby monitor. (2018)
  35. 35.
    Cocoon cam baby monitor. (2019)
  36. 36.
    Evoz wifi baby vision monitor. (2019)
  37. 37.
    Varallyay, G.: The melody of crying. Int. J. Pediatr. Otorhinolaryngol. 71(11), 1699–1708 (2007)CrossRefGoogle Scholar
  38. 38.
    Zabidi, A., Khuan, L.Y., Mansor, W., Yassin, I.M., Sahak, R.: Classification of infant cries with asphyxia using multilayer perceptron neural network. In: Proceedings of the 2010 Second International Conference on Computer Engineering and Applications—Series. ICCEA 2010, vol. 01, pp. 204–208. IEEE Computer Society, Washington, DC, USA (2010)Google Scholar
  39. 39.
    Orlandi, S., Reyes-Garcia, C.A., Bandini, A., Donzelli, G., Manfredi, C.: Application of pattern recognition techniques to the classification of full-term and preterm infant cry. J. Voice Off. J. Voice Found. 30, 10 (2015)Google Scholar
  40. 40.
    Michelsson, K., Michelsson, O.: Phonation in the newborn, infant cry. Int. J. Pediatr. Otorhinolaryngol. 49, S297–S301 (1999)CrossRefGoogle Scholar
  41. 41.
    Bowlby, J.: Attachment and Loss. Basic Books, vol. 1 (1969)Google Scholar
  42. 42.
    Ostwald, P.: The sounds of infancy. Dev. Med. Child Neurol. 14(3), 350–361 (1972)CrossRefGoogle Scholar
  43. 43.
    Owings, D., Zeifman, D.: Human infant crying as an animal communication system: insights from an assessment/management approach. In: Evolution of Communication Systems: A Comparative Approach, pp. 151–170 (2004)Google Scholar
  44. 44.
    Nelson, J.: Seeing Through Tears: Crying and Attachment. Routledge (2005)Google Scholar
  45. 45.
    IJzerman, H., et al.: A theory of social thermoregulation in human primates. Front. Psychol. 6, 464 (2015)Google Scholar
  46. 46.
    Butler, E.A., Randall, A.K.: Emotional coregulation in close relationships. Emot. Rev. 5(2), 202–210 (2013)CrossRefGoogle Scholar
  47. 47.
    LaGasse, L.L., Neal, A.R., Lester, B.M.: Assessment of infant cry: a coustic cry analysis and parental perception. Ment. Retard. Dev. Disabil. Res. Rev. 11(1), 83–93 (2005)CrossRefGoogle Scholar
  48. 48.
    Hendriks, M., Nelson, J.K., Cornelius, R., Vingerhoets, A.: Why crying improves our well-being: an attachment-theory perspective on the functions of adult crying. In: Emotion Regulation: Conceptual and Clinical Issues, pp. 87–96 (2008)Google Scholar
  49. 49.
    Pal, P.A., Iyer, N., Yantorno, R.E.: Emotion detection from infant facial expressions and cries. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2, pp. II–II (2006)Google Scholar
  50. 50.
    Barajas-Montiel, S., Reyes-Garcia, C.A.: Identifying pain and hunger in infant cry with classifiers ensembles, pp. 770 – 775 (2005)Google Scholar
  51. 51.
    Wasz-Höckert, O.: The infant cry: a spectrographic and auditory analysis. Spastics International Medical Publications in association with W. Heinemann Medical Books. Series Clinics in Developmental Medicine (1968)Google Scholar
  52. 52.
    Vingerhoets, A.: Why Only Humans Weep: Unravelling the Mysteries of Tears. Oxford University Press (2013)Google Scholar
  53. 53.
    Bell, S.M., Salter Ainsworth, M.D.: Infant crying and maternal responsiveness. In: Child development, vol. 43, pp. 1171–90 (1973)CrossRefGoogle Scholar
  54. 54.
    Lounsbury, M.L., Bates, J.E.: The cries of infants of differing levels of perceived temperamental difficultness: acoustic properties and effects on listeners. Child Dev. 53(3), 677–686 (1982)CrossRefGoogle Scholar
  55. 55.
    Zeskind, P., Barr, R.: Acoustic characteristics of naturally occurring cries of infants with colic. Child Dev. 68, 394–403 (1997)CrossRefGoogle Scholar
  56. 56.
    Laan, A., Assen, M.V., Vingerhoets, A.: Individual differences in adult crying: the role of attachment styles. Soc. Behav. Person. Int. J. (2012)Google Scholar
  57. 57.
    Bryant Furlow, F.: Human neonatal cry quality as an honest signal of fitness. Evol. Hum. Behav. 18, 175–193 (1997)CrossRefGoogle Scholar
  58. 58.
    Kheddache, Y., Tadj, C.: Acoustic measures of the cry characteristics of healthy newborns and newborns with pathologies. J. Biomed. Sci. Eng. 06(08), 796–804 (2013)CrossRefGoogle Scholar
  59. 59.
    Orlandi, S., Manfredi, C., Bocchi, L., Scattoni, M.L.: Automatic newborn cry analysis: a non-invasive tool to help autism early diagnosis. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2953–2956 (2012)Google Scholar
  60. 60.
    Sheinkopf, S.J., Iverson, J.M., Rinaldi, M.L., Lester, B.M.: Atypical cry acoustics in 6-month-old infants at risk for autism spectrum disorder. Autism Res. 5(5), 331–339 (2012)CrossRefGoogle Scholar
  61. 61.
    Jeyaraman, S., Muthusamy, H., Wan, K., Jeyaraman, S., Nadarajaw, T., Yaacob, S., Nisha, S.: A review: survey on automatic infant cry analysis and classification. Health Technol. 8 (2018)CrossRefGoogle Scholar
  62. 62.
    IJzerman, H., Čolić, M., Hennecke, M., Hong, Y., Hu, C.-P., Joy-Gaba, J., Lazarevic, D., Lazarevic, L., Parzuchowski, M., Ratner, K.G., Schubert, T., Schuetz, A., Stojilovi, D., Weissgerber, S., Zickfeld, J., Lindenberg, S.: Does distance from the equator predict self-control? Lessons from the human penguin project. Behav. Brain Sci. 40 (2017)Google Scholar
  63. 63.
    IJzerman, H., Lindenberg, S., Dalgar, I., Weissgerber, S., Clemente Vergara, R., Cairo, A., oli, M., Dursun, P., Frankowska, N., Hadi, R., Hall, C., Hong, Y., Hu, C.-P., Joy-Gaba, J., Lazarevic, D., Lazarevic, L., Parzuchowski, M., Ratner, K.G., Rothman, D., Zickfeld J.: The human penguin project: climate, social integration, and core body temperature. Collabra: Psychol. 4 (2018)Google Scholar
  64. 64.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science+Business Media (2006)Google Scholar
  65. 65.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, (2016)
  66. 66.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  67. 67.
    Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR (2001)Google Scholar
  68. 68.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR (2016) arXiv:1602.07261
  69. 69.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  70. 70.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, arXiv:1502.03167
  71. 71.
    Phan, H., Koch, P., Katzberg, F., Maaß, M., Mazur, R., Mertins, A.: Audio scene classification with deep recurrent neural networks. In: INTERSPEECH (2017)Google Scholar
  72. 72.
    Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. CoRR (2013) arXiv:1303.5778
  73. 73.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  74. 74.
    Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278 (2013)Google Scholar
  75. 75.
    Ben-Yehuda, T., Abramovich, I., Cohen, R.: Low-complexity video classification using recurrent neural networks. In: 2018 International Conference on the Science of Electrical Engineering (ICSEE 2018) (2018)Google Scholar
  76. 76.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  77. 77.
    Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks, Series Studies in Computational Intelligence, vol. 385. Springer (2012)Google Scholar
  78. 78.
    Fei, H., Tan, F.: Bidirectional grid long short-term memory (bigridlstm): a method to address context-sensitivity and vanishing gradient. Algorithms 11 (2018)CrossRefGoogle Scholar
  79. 79.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Series Proceedings of Machine Learning Research, vol. 9, pp. 249–256 PMLR (2010)Google Scholar
  80. 80.
    Cohen, R., Lavner, Y.: Infant cry analysis and detection. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI 2012), pp. 2–6 (2012)Google Scholar
  81. 81.
    Noll, A.M.: Cepstrum pitch determination. J. Acoust. Soc. Am. 41(2), 293–309 (1967)CrossRefGoogle Scholar
  82. 82.
    van Waterschoot, T., Moonen, M.: Fifty years of acoustic feedback control: state of the art and future challenges. Proc. IEEE 99(2), 288–327 (2011)CrossRefGoogle Scholar
  83. 83.
    van Waterschoot, T., Moonen, M.: Comparative evaluation of howling detection criteria in notch-filter-based howling suppression. J. Audio Eng. Soc. 58(11), 923–940 (2010)Google Scholar
  84. 84.
    Rabiner, L.R., Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)Google Scholar
  85. 85.
    Quatieri, T.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall, London (2002)Google Scholar
  86. 86.
    Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines (1998)Google Scholar
  87. 87.
    Frederiks, K., Sterkenburg, P., Lavner, Y., Cohen, R., Ruinskiy, D., Verbeke, W., IJzerman, H.: Mobile social physiology as the future of relationship research and therapy: presentation of the bio-app for bonding (BAB), PsyArXiv (2018)Google Scholar
  88. 88.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Rami Cohen
    • 1
    Email author
  • Dima Ruinskiy
    • 2
    • 3
  • Janis Zickfeld
    • 4
  • Hans IJzerman
    • 5
  • Yizhar Lavner
    • 2
  1. 1.Viterbi Faculty of Electrical EngineeringTechnion - Israel Institute of TechnologyHaifaIsrael
  2. 2.Department of Computer ScienceTel-Hai CollegeUpper GalileeIsrael
  3. 3.Network.IO Innovation LabHaifaIsrael
  4. 4.Department of PsychologyUniversity of OsloOsloNorway
  5. 5.LIP/PC2S, Université Grenoble AlpesGrenobleFrance

Personalised recommendations