Automatic Sound Recognition of Urban Environment Events

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9319)


The audio analysis of speaker’s surroundings has been a first step for several processing systems that enable speaker’s mobility though his daily life. These algorithms usually operate in a short-time analysis decomposing the incoming events in time and frequency domain. In this paper, an automatic sound recognizer is studied, which investigates audio events of interest from urban environment. Our experiments were conducted using a close set of audio events from which well known and commonly used audio descriptors were extracted and models were training using powerful machine learning algorithms. The best urban sound recognition performance was achieved by SVMs with accuracy equal to approximately 93 %.


Automatic sound recognition Urban environment Dimensionality redundancy 


  1. 1.
    The BBC sound effects library original series.
  2. 2.
    Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)CrossRefGoogle Scholar
  3. 3.
    Bartsch, M.A., Wakefield, G.H.: Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. Multimedia 7(1), 96–104 (2005)CrossRefGoogle Scholar
  4. 4.
    Casey, M.: General sound classification and similarity in MPEG-7. Organised Sound 6(02), 153–164 (2001)CrossRefGoogle Scholar
  5. 5.
    Couvreur, L., Laniray, M.: Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models. InterNoise, Prague, Czech Republic, pp. 1–8 (2004)Google Scholar
  6. 6.
    Dogan, E., Sert, M., Yazici, A.: Content-based classification and segmentation of mixed-type audio by using mpeg-7 features. In:First International Conference on Advances in Multimedia, MMEDIA 2009, pp. 152–157. IEEE (2009)Google Scholar
  7. 7.
    Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010)Google Scholar
  8. 8.
    Fernandez, L.P.S., Ruiz, A.R., de JM Juarez, J.: Urban noise permanent monitoring and pattern recognition. In: Proceedings of the European Conference of Communications-ECCOM, vol. 10, pp. 143–148 (2010)Google Scholar
  9. 9.
    Huang, R., Hansen, J.H.: Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. IEEE Trans. Audio Speech Lang. Process. 14(3), 907–919 (2006)CrossRefGoogle Scholar
  10. 10.
    Khunarsal, P., Lursinsap, C., Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. Inf. Sci. 243, 57–74 (2013)CrossRefGoogle Scholar
  11. 11.
    Kim, H.G., Moreau, N., Sikora, T.: Audio classification based on MPEG-7 spectral basis representations. IEEE Trans. Circuits Syst. Video Technol. 14(5), 716–725 (2004)CrossRefGoogle Scholar
  12. 12.
    Kinnunen, T., Saeidi, R., Leppänen, J., Saarinen, J.P.: Audio context recognition in variable mobile environments from short segments using speaker and language recognizers. In: The Speaker and Language Recognition Workshop, pp. 301–311 (2012)Google Scholar
  13. 13.
    Lee, K., Slaney, M.: Automatic chord recognition from audio using a HMM with supervised learning. In: ISMIR, pp. 133–137 (2006)Google Scholar
  14. 14.
    Lu, H., Pan, W., Lane, N.D., Choudhury, T., Campbell, A.T.: Soundsense: scalable sound sensing for people-centric applications on mobile phones. In: Proceedings of the 7th international conference on Mobile systems, applications, and services, pp. 165–178. ACM (2009)Google Scholar
  15. 15.
    Ntalampiras, S.: Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process. 31, 69–78 (2014)CrossRefGoogle Scholar
  16. 16.
    Ntalampiras, S., Potamitis, I., Fakotakis, N.: Exploiting temporal feature integration for generalized sound recognition. EURASIP J. Adv. Sig. Process. 2009(1), 807162 (2009)CrossRefGoogle Scholar
  17. 17.
    Patsis, Y., Verhelst, W.: A speech/music/silence/garbage/classifier for searching and indexing broadcast news material. In: 19th International Workshop on Database and Expert Systems Application, DEXA 2008, pp. 585–589. IEEE (2008)Google Scholar
  18. 18.
    Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia, pp. 1041–1044. ACM (2014)Google Scholar
  19. 19.
    Slaney, M.: Auditory toolbox. Interval Research Corporation. Technical report vol. 10 (1998)Google Scholar
  20. 20.
    Smith, J.W., Pijanowski, B.C.: Human and policy dimensions of soundscape ecology. Global Environ. Change 28, 63–74 (2014)CrossRefGoogle Scholar
  21. 21.
    Torija, A., Diego, P.R., Ramos-Ridao, A.: Ann-based m events. a too against envi environment (2011)Google Scholar
  22. 22.
    Tran, H.D., Li, H.: Sound event recognition with probabilistic distance SVMs. IEEE Trans. Audio Speech Lang. Process. 19(6), 1556–1568 (2011)CrossRefGoogle Scholar
  23. 23.
    Valero, X., Alías, F., Oldoni, D., Botteldooren, D.: Support vector machines and self-organizing maps for the recognition of sound events in urban soundscapes. In: 41st International Congress and Exposition on Noise Control Engineering (Inter-Noise-2012). Institute of Noise Control Engineering (2012)Google Scholar
  24. 24.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Artificial Intelligent Group, Wire Communication Laboratory, Department of Electrical and Computer EngineeringUniversity of PatrasRion-patrasGreece
  2. 2.Computer and Informatics Engineering DepartmentTechnological Educational Institute of Western GreeceAntirioGreece

Personalised recommendations