Adavanne, S., Parascandolo, G., Pertila, P., Heittola, T., Virtanen, T.: Sound event detection in multichannel audio using spatial and harmonic features. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 6–10 (2016)
Google Scholar
Bae, S.H., Choi, I., Kim, N.S.: Acoustic scene classification using parallel combination of LSTM and CNN. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 11–15 (2016)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH
Google Scholar
Brown, J.C.: Calculation of a constant q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
CrossRef
Google Scholar
Çakır, E., Heittola, T., Huttunen, H., Virtanen, T.: Polyphonic sound event detection using multi label deep neural networks. In: The International Joint Conference on Neural Networks 2015 (IJCNN 2015) (2015)
Google Scholar
Çakır, E., Heittola, T., Virtanen, T.: Domestic audio tagging with convolutional neural networks. Technical report, DCASE2016 Challenge (2016)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980)
CrossRef
Google Scholar
Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014). doi:10.1109/ICASSP.2014.6854950
Google Scholar
Du, K.L., Swamy, M.N.: Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated, New York (2013)
MATH
Google Scholar
Espi, M., Fujimoto, M., Kubo, Y., Nakatani, T.: Spectrogram patch based acoustic event detection and classification in speech overlapping conditions. In: 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 117–121 (2014)
Google Scholar
Espi, M., Fujimoto, M., Kinoshita, K., Nakatani, T.: Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP J. Audio Speech Music Process. 2015(1), 26 (2015)
CrossRef
Google Scholar
Gold, B., Morgan, N., Ellis, D.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music, 2nd edn. Wiley-Interscience, New York, NY (2011)
CrossRef
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), ICML’14, vol. 14, pp. 1764–1772. JMLR Workshop and Conference Proceedings (2014)
Google Scholar
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
MathSciNet
CrossRef
Google Scholar
Heittola, T., Mesaros, A., Virtanen, T., Gabbouj, M.: Supervised model training for overlapping sound events based on unsupervised source separation. In: 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), pp. 8677–8681 (2013)
Google Scholar
Hertel, L., Phan, H., Mertins, A.: Comparing time and frequency domain for audio event recognition using deep learning. In: Proceedings IEEE International Joint Conference on Neural Networks (IJCNN 2016), pp. 3407–3411 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)
Google Scholar
Kumar, A., Raj, B.: Audio event detection using weakly labeled data. In: Proceedings of the 2016 ACM on Multimedia Conference, MM’16, pp. 1038–1047. ACM, New York (2016)
Google Scholar
Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition — A Bridge to Practical Applications, 1st edn., 306 pp. Elsevier, Amsterdam (2015)
Google Scholar
Lim, H., Kim, M.J., Kim, H.: Cross-acoustic transfer learning for sound event classification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2504–2508 (2016)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)
MATH
Google Scholar
Ng, A., Jordan, A.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv. Neural Inf. Proces. Syst. 14, 841 (2002)
Google Scholar
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)
CrossRef
Google Scholar
Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-Time Signal Processing. Prentice Hall, Upper Saddle River, NJ (1999)
Google Scholar
Parascandolo, G., Huttunen, H., Virtanen, T.: Recurrent neural networks for polyphonic sound event detection in real life recordings. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444 (2016)
Google Scholar
Petetin, Y., Laroche, C., Mayoue, A.: Deep neural networks for audio scene recognition. In: 23rd European Signal Processing Conference (EUSIPCO), pp. 125–129. IEEE, New York (2015)
Google Scholar
Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE International Workshop on Machine Learning for Signal Processing (2015)
CrossRef
Google Scholar
Salomon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 171–175 (2015)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
CrossRef
Google Scholar
Schröder, J., Moritz, N., Schädler, M.R., Cauchi, B., Adiloglu, K., Anemüller, J., Doclo, S., Kollmeier, B., Goetze, S.: On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4 (2013)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet
MATH
Google Scholar
Stevens, S.S., Volkmann, J.: The relation of pitch to frequency: a revised scale. Am. J. Psychol. 53, 329–353 (1940)
CrossRef
Google Scholar
Stevens, S.S., Volkmann, J., Newman, E.B.: A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8(3), 185–190 (1937)
CrossRef
Google Scholar
Tzanetakis, G., Essl, G., Cook, P.R.: Audio analysis using the discrete wavelet transform. In: Proceedings of the WSES International Conference Acoustics and Music: Theory and Applications (AMTA 2001), Skiathos, pp. 318–323 (2001)
Google Scholar
Valenti, M., Diment, A., Parascandolo, G., Squartini, S., Virtanen, T.: DCASE 2016 acoustic scene classification using convolutional neural networks. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 95–99 (2016)
Google Scholar
Van der Aalst, W.M., Rubin, V., Verbeek, H., van Dongen, B.F., Kindler, E., Günther, C.W.: Process mining: a two-step approach to balance between underfitting and overfitting. Softw. Syst. Model. 9(1), 87–111 (2010)
CrossRef
Google Scholar
Xu, Y., Huang, Q., Wang, W., Jackson, P.J.B., Plumbley, M.D.: Fully DNN-based multi-label regression for audio tagging. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 105–109 (2016)
Google Scholar
Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P.J.B., Plumbley, M.D.: Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 1230–1241 (2017)
CrossRef
Google Scholar
Yuji Tokozume, T.H.: Learning environmental sounds with end-to-end convolutional neural network. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2712–2725 (2017)
Google Scholar
Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 836–845 (2014)
CrossRef
Google Scholar
Zölzer, U. (ed.): Digital Audio Signal Processing, 2nd edn. Wiley, New York (2008)
Google Scholar