Sound learning–based event detection for acoustic surveillance sensors

  • Jeong-Sik Park
  • Seok-Hoon KimEmail author


This study proposes an event detection technique for acoustic surveillance that detects emergency situations by using acoustic sensors. Most surveillance systems have widely depended on visual data recorded by closed-circuit television (CCTV) cameras, but more intelligent systems are now beginning to use audio information for more reliable detection of emergency situations. Most of the conventional studies on acoustic event detection adopt limited types of acoustic data and are based on simple algorithms, such as energy-based determination. Thus, these approaches are easily realized, but may induce serious detection errors in real-world applications. In this study, we propose an event detection technique based on a sound-learning algorithm to be adopted by real-time acoustic surveillance systems. One main process of this technique is to construct acoustic models via learning algorithms from sound data collected according to types of acoustic events. The models are used to determine whether audio streams entering an acoustic sensor refer to the events or not. In event detection experiments performed in an outdoor environment, the proposed approach outperformed conventional approaches in the real-time detection of acoustic events.


Acoustic surveillance sensor Surveillance system Sound learning Acoustic event detection 



This research was supported by Hankuk University of Foreign Studies Research Fund, Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A1A09000903), the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2019-2016-0-00313) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation), and the research grant of Pai Chai University in 2019.


  1. 1.
    Bilmes J (1997) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. University of Berkeley, International Computer Science Institute, Tech. Rep., ICSI-TR-97-021Google Scholar
  2. 2.
    Campbell JP (1997) Speaker recognition: a tutorial. IEEE 85(9):1437–1462CrossRefGoogle Scholar
  3. 3.
    Clavel C, Ehrette T, Richard G (2005) Events detection for an audio-based surveillance system. In: IEEE international conference on multimedia and expoGoogle Scholar
  4. 4.
    Cornacchia M, Ozcan K, Zheng Y, Velipasalar S (2017) A survey on activity detection and classification using wearable sensors. IEEE Sensors J 17(2):386–403CrossRefGoogle Scholar
  5. 5.
    Didrikas T, Kubilius R, Spegys M (2011) Surveillance of marine resources using multi-frequency hydroacoustics. Klaipeda Univ Tech Rep, NOR-LT0047Google Scholar
  6. 6.
    Hinton GE, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
  7. 7.
    Huang PS, Zhuang X, Hasegawa-Johnson M (2011) Improving acoustic event detection using generalizable visual features and multi-modality modeling. In: IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  8. 8.
    Kang K, Ouyang W, Li H, Wang X (2016) Object detection from video tubelets with convolutional neural networks. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar
  9. 9.
    Lurton X (2002) An introduction to underwater acoustics: principles and applications. Springer Science & Business MediaGoogle Scholar
  10. 10.
    Lyon D (2007) Surveillance studies: an overview. Polity Press, CambridgeGoogle Scholar
  11. 11.
    Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. In: European signal processing conferenceGoogle Scholar
  12. 12.
    Ntalampiras S, Potamitis I, Fakotakis N (2009) On acoustic surveillance of hazardous situations. In: IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  13. 13.
    Park JS, Kim JH, Oh YH (2009) Feature vector classification based speech emotion recognition for service robots. IEEE Trans Consum Electron 55(3):1590–1596CrossRefGoogle Scholar
  14. 14.
    Park KM, Park JS, Oh YH (2010) GMM adaptation based online speaker segmentation for spoken document retrieval. IEEE Trans Consum Electron 56(2):1123–1129CrossRefGoogle Scholar
  15. 15.
    Park JS, Jang GJ, Kim JH (2012) Multistage utterance verification for keyword recognition-based online spoken content retrieval. IEEE Trans Consum Electron 58(3):1000–1005CrossRefGoogle Scholar
  16. 16.
    Phillips PJ, Martin A, Wilson CL, Przybocki M (2000) An introduction evaluating biometric systems. Computer 33(2):56–63CrossRefGoogle Scholar
  17. 17.
    Qureshi FZ, Terzopoulos D (2006) Surveillance camera scheduling: a virtual vision approach. Multimed Syst 12(3):269–283CrossRefGoogle Scholar
  18. 18.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):257–286CrossRefGoogle Scholar
  19. 19.
    Turaga P, Yuri AI (2011) Diamond sentry: integrating sensors and cameras for real-time monitoring of indoor spaces. IEEE Sensors J 11(3):593–602CrossRefGoogle Scholar
  20. 20.
    Zhu Z, Huang TS (2009) Multimodal surveillance: sensors, algorithms and systems. Artech. HouseGoogle Scholar
  21. 21.
    Zhuang X, Zhou X, Hasegawa-Johnson M, Huang PS (2010) Real-world acoustic event detection. Pattern Recogn Lett 31(12):1543–1551CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of English Linguistics & Language TechnologyHankuk University of Foreign StudiesSeoulRepublic of Korea
  2. 2.Department of Electronic CommercePaichai UniversityDaejeonRepublic of Korea

Personalised recommendations