Advertisement

Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification

  • Sławomir K. ZielińskiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)

Abstract

This paper extends the traditional methodology of acoustic scene classification based on machine listening towards a new class of multichannel audio signals. It identifies a set of new features of five-channel surround recordings for classification of the two basic spatial audio scenes. Moreover, it compares the three artificial intelligence-based classification approaches to audio scene classification. The results indicate that the method based on the early fusion of features is superior compared to those involving the late fusion of signal metrics.

Keywords

Machine listening Acoustic scene classification Feature extraction Ensemble-based classifiers 

Notes

Acknowledgements

This work was supported by a grant S/WI/1/2013 from Bialystok University of Technology and funded from the resources for research by Ministry of Science and Higher Education.

References

  1. 1.
    Richard, G., Virtanen, T., Bello, J.P., Ono, N., Glotin, H.: Introduction to the special section on sound scene and event analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1169–1171 (2017)CrossRefGoogle Scholar
  2. 2.
    Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)CrossRefGoogle Scholar
  3. 3.
    Chu, S., Narayanan, S., Jay Kuo C.-C., Matarić, M.J.: Where am I? Scene recognition for mobile robots using audio features. In: Proceedings of IEEE International Conference on Multimedia and Expo, Toronto, Canada, pp. 885–888. IEEE (2006)Google Scholar
  4. 4.
    Petetin, Y., Laroche, C., Mayoue, A.: Deep neural networks for audio scene recognition. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), Nice, France, pp. 125–129. IEEE (2015)Google Scholar
  5. 5.
    Bisot, V., Serizel, R., Essid, S., Richard, G.: Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1216–1229 (2017)CrossRefGoogle Scholar
  6. 6.
    Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., Mertins, A.: Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1278–1290 (2017)CrossRefGoogle Scholar
  7. 7.
    Dargie, W.: Adaptive audio-based context recognition. IEEE Trans. Syst. Man Cybern. – Part A: Syst. Hum. 39(4), 715–725 (2009)CrossRefGoogle Scholar
  8. 8.
    Stowell, D., Benetos, E.: On-bird sound recordings: automatic acoustic recognition of activities and contexts. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1193–1206 (2017)CrossRefGoogle Scholar
  9. 9.
    Geiger, J.T., Schuller, B., Rigoll, G.: Large-scale audio feature extraction and SVM for acoustic scene classification. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY. IEEE (2013)Google Scholar
  10. 10.
    Trowitzsch, I., Mohr, J., Kashef, Y., Obermayer, K.: Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)CrossRefGoogle Scholar
  11. 11.
    Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017)CrossRefGoogle Scholar
  12. 12.
    Yang, W., Kirshnan, S.: Combining temporal features by local binary pattern for acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1315–1321 (2017)CrossRefGoogle Scholar
  13. 13.
    Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)CrossRefGoogle Scholar
  14. 14.
    ITU-R Rec. BS.775: Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012)Google Scholar
  15. 15.
    Sánchez-Hevia, H.A., Ayllón, D., Gil-Pita, R., Rosa-Zurera, M.: Maximum likelihood decision fusion for weapon classification in wireless acoustic sensor networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1172–1182 (2017)CrossRefGoogle Scholar
  16. 16.
    Rumsey, F.: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)Google Scholar
  17. 17.
    Breiman, L., Cutler, A.: Random Forests for Classification and Regression. https://www.stat.berkeley.edu/~breiman/RandomForests. Accessed 18 Nov 2017
  18. 18.
    Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRefGoogle Scholar
  19. 19.
    Trajdos, P., Kurzynski, M.: A dynamic model of classifier competence based on the local fuzzy confusion matrix and the random reference classifier. Int. J. Appl. Math. Comput. Sci. 26(1), 175–189 (2016)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Ridgeway, G.: Generalized Boosted Regression Models. http://code.google.com/p/gradientboostedmodels. Accessed 18 Nov 2017
  22. 22.
    Bradley, J.S., Soulodre, G.A.: Objective measures of listener envelopment. J. Acoust. Soc. Am. 98(5), 2590–2597 (1995)CrossRefGoogle Scholar
  23. 23.
    Jollifee, F.: Principal Component Analysis, 2nd edn. Springer, Berlin (2002).  https://doi.org/10.1007/b98835CrossRefGoogle Scholar
  24. 24.
    George, S., Zieliński, S., Rumsey, F.: Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans. Audio Speech Lang. Process. 14(6), 1994–2005 (2006)CrossRefGoogle Scholar
  25. 25.
    Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014)CrossRefGoogle Scholar
  26. 26.
    Gardner, B., Martin, K.: HRTF Measurements of a KEMAR Dummy-Head Microphone. http://sound.media.mit.edu/resources/KEMAR.html. Accessed 16 Nov 2017

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Computer ScienceBiałystok University of TechnologyBiałystokPoland

Personalised recommendations