Skip to main content

Statistical Methods for Scene and Event Classification

  • Chapter
  • First Online:
Computational Analysis of Sound Scenes and Events

Abstract

This chapter surveys methods for pattern classification in audio data. Broadly speaking, these methods take as input some representation of audio, typically the raw waveform or a time-frequency spectrogram, and produce semantically meaningful classification of its contents. We begin with a brief overview of statistical modeling, supervised machine learning, and model validation. This is followed by a survey of discriminative models for binary and multi-class classification problems. Next, we provide an overview of generative probabilistic models, including both maximum likelihood and Bayesian parameter estimation. We focus specifically on Gaussian mixture models and hidden Markov models, and their application to audio and time-series data. We then describe modern deep learning architectures, including convolutional networks, different variants of recurrent neural networks, and hybrid models. Finally, we survey model-agnostic techniques for improving the stability of classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The notation \( \mathbf{P}_{\mathcal{D}} \) denotes the probability mass (or density) with respect to distribution \( \mathcal{D} \), and \( \mathbf{E}_{\mathcal{D}} \) denotes the expectation with respect to distribution \( \mathcal{D} \).

  2. 2.

    Quantifying the relationships between (5.5), (5.4), and (5.3) lies within the purview of statistics and computational learning theory, and is beyond the scope of this text. We refer interested readers to [48, 106] for an introduction to the subject.

  3. 3.

    SVM scores can be converted into probabilities via Platt scaling [94] or isotonic regression [122], but these methods require additional modeling and calibration.

  4. 4.

    The notion of independence for multi-label problems will be treated more thoroughly when we develop deep learning models.

  5. 5.

    The factor of 1∕n is not strictly necessary here, but are included for consistency with (5.6).

  6. 6.

    \( \mathbb{S}_{++}^{d} \) denotes the set of d × d positive definite matrices: Hermitian matrices with strictly positive eigenvalues.

  7. 7.

    A probability distribution P[θ] is a conjugate prior if the posterior P[θ | S] has the same form as the prior P[θ] [97].

  8. 8.

    Note that although we use T to denote the length of an arbitrary sequence x, it is not required that all sequences have the same length.

  9. 9.

    For ease of notation, we denote the initial state distribution as \( \mathbf{P}_{\theta }\left [z[1]\,\middle\vert \,z[0]\right ] \), rather than the unconditional form \( \mathbf{P}_{\theta }\left [z[1]\right ] \).

  10. 10.

    The well-known Baum–Welch algorithm for HMM parameter estimation is a special case of expectation-maximization [96].

  11. 11.

    Some authors refer to the layer dimension d i as width. This terminology can be confusing when applied to spatio-temporal data as in Sect. 5.4.3, so we will use dimension to indicate d i and retain width to describe a spatial or temporal extent of data.

  12. 12.

    To see this, observe that if ρ i is omitted, then the full model f(x | θ) is a composition of affine functions, which is itself an affine function, albeit one with rank constraints imposed by the sequence of layer dimensions.

  13. 13.

    Note that batch normalization accomplishes this scaling implicitly by estimating these statistics during training [68].

  14. 14.

    A valid-mode convolution is one in which the response is computed only at positions where the signal z and filter w fully overlap. For \( z \in \mathbb{R}^{T} \) and \( w \in \mathbb{R}^{n} \), the valid convolution \( w {\ast} z \in \mathbb{R}^{T-n+1} \).

  15. 15.

    Technically, (5.54) is written as a cross-correlation and not a convolution. However, since the weights w are variables to be learned, and all quantities are real-valued, the distinction is not important.

  16. 16.

    A key distinction between recurrent networks and HMMs is that the “state space” in a recurrent network is continuous, i.e., \( h_{t} \in \mathbb{R}^{d_{i}} \).

  17. 17.

    The presentation of Graves [57] differs slightly in its inclusion of “peephole” connections [51]. We omit these connections here for clarity of presentation, and because recent studies have not demonstrated their efficacy [60].

  18. 18.

    Some authors define the BRNN output (5.62) as a non-linear transformation of the concatenated state vectors [55]. This formulation is equivalent to (5.62) followed by a one-dimensional convolutional layer with a receptive field n i = 1, so we opt for the simpler definition here.

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/, Software available from tensorflow.org

  2. Akaike, H.: Likelihood of a model and information criteria. J. Econom. 16(1), 3–14 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  3. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin (2015). arXiv preprint arXiv:1512.02595

    Google Scholar 

  4. Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50(1–2), 5–43 (2003)

    Article  MATH  Google Scholar 

  5. Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152–1174 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473

    Google Scholar 

  7. Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  8. Beal, M.J.: Variational algorithms for approximate Bayesian inference. University of London (2003)

    Google Scholar 

  9. Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data (2013). arXiv preprint arXiv:1306.6709

    Google Scholar 

  10. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  11. Bergstra, J., Bastien, F., Breuleux, O., Lamblin, P., Pascanu, R., Delalleau, O., Desjardins, G., Warde-Farley, D., Goodfellow, I., Bergeron, A., et al.: Theano: deep learning on GPUs with python. In: Big Learn, Neural Information Processing Systems Workshop (2011)

    Google Scholar 

  12. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  13. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  14. Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10(Sep), 2137–2155 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Blei, D.M., Jordan, M.I., et al.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1(1), 121–144 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128. Association for Computational Linguistics, Trento (2006)

    Google Scholar 

  17. Böck, S., Schedl, M.: Enhanced beat tracking with context-aware neural networks. In: Proceedings of the International Conference on Digital Audio Effects (2011)

    Google Scholar 

  18. Bottou, L.: Stochastic gradient learning in neural networks. Proc. Neuro-Nımes 91(8), 687–696 (1991)

    Google Scholar 

  19. Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: Audio chord recognition with recurrent neural networks. In: Proceedings of the International Conference on Music Information Retrieval, pp. 335–340. Citeseer (2013)

    Google Scholar 

  20. Boulanger-Lewandowski, N., Droppo, J., Seltzer, M., Yu, D.: Phone sequence modeling with recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5417–5421. IEEE, New York (2014)

    Google Scholar 

  21. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  22. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, New York (1984)

    MATH  Google Scholar 

  23. Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M.A., Guo, J., Li, P., Riddell, A.: Stan: a probabilistic programming language. J. Stat. Softw. 20, 1–37 (2016)

    Google Scholar 

  24. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)

    Article  Google Scholar 

  25. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 1724–1734 (2014)

    Google Scholar 

  26. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259

    Google Scholar 

  27. Chollet, F.: Keras. https://github.com/fchollet/keras (2015). Retrieved on 2017-01-02.

  28. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: Big Learn, Neural Information Processing Systems Workshop, EPFL-CONF-192376 (2011)

    Google Scholar 

  29. Cortes, C., Mohri, M.: Domain adaptation and sample bias correction theory and algorithm for regression. Theor. Comput. Sci. 519, 103–126 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  31. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)

    MATH  Google Scholar 

  32. Cox, D.R.: The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Methodol. 20, 215–242 (1958)

    MathSciNet  MATH  Google Scholar 

  33. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2(Dec), 265–292 (2001)

    MATH  Google Scholar 

  34. Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(9), 1469–1477 (2015)

    Google Scholar 

  35. Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  36. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39, 1–38 (1977)

    MATH  Google Scholar 

  37. Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE, New York (2014)

    Google Scholar 

  38. Dieleman, S., Schlüter, J., Raffel, C., Olson, E., Sønderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., Diogo149, McFee, B., Weideman, H., Takacsg84, Peterderivaz, Jon, Instagibbs, Rasul, D.K., CongLiu, Britefury, Degrave, J.: Lasagne: first release (2015). doi:10.5281/zenodo.27878. https://doi.org/10.5281/zenodo.27878

  39. Dietterich, T.G.: Ensemble learning. In: The Handbook of Brain Theory and Neural Networks, 2nd edn., pp. 110–125. MIT Press, Cambridge, MA (2002)

    Google Scholar 

  40. Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1), 31–71 (1997)

    Article  MATH  Google Scholar 

  41. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  42. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM, New York (2008)

    Google Scholar 

  43. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)

    Article  Google Scholar 

  44. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)

    MATH  Google Scholar 

  45. Feldman, V., Guruswami, V., Raghavendra, P., Wu, Y.: Agnostic learning of monomials by halfspaces is hard. In: Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pp. 385–394. IEEE Computer Society, New York (2009)

    Google Scholar 

  46. Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2960–2967 (2013)

    Google Scholar 

  47. Fix, E., Hodges, J.L. Jr.: Discriminatory analysis-nonparametric discrimination: consistency properties. Technical Report, DTIC Document (1951)

    MATH  Google Scholar 

  48. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001)

    Google Scholar 

  49. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(59), 1–35 (2016). http://jmlr.org/papers/v17/15-239.html

    MathSciNet  MATH  Google Scholar 

  50. Gelfand, A.E., Smith, A.F.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85(410), 398–409 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  51. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 3, pp. 189–194. IEEE, New York (2000)

    Google Scholar 

  52. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256 (2010)

    Google Scholar 

  53. Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2066–2073 (2012)

    Google Scholar 

  54. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA (2016). http://www.deeplearningbook.org

    MATH  Google Scholar 

  55. Graves, A.: Sequence transduction with recurrent neural networks. CoRR abs/1211.3711 (2012). http://arxiv.org/abs/1211.3711

  56. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin (2012)

    Book  MATH  Google Scholar 

  57. Graves, A.: Generating sequences with recurrent neural networks (2013). arXiv preprint arXiv:1308.0850

    Google Scholar 

  58. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, vol. 14, pp. 1764–1772 (2014)

    Google Scholar 

  59. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)

    Article  Google Scholar 

  60. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey (2015). arXiv preprint arXiv:1503.04069

    Google Scholar 

  61. Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset Shift Mach. Learn. 3(4), 5 (2009)

    Google Scholar 

  62. Hastings, W.K.: Monte carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  63. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  64. Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio Speech Music Process. 2013(1), 1–13 (2013)

    Article  Google Scholar 

  65. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Google Scholar 

  66. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  67. Humphrey, E.J., Bello, J.P.: Rethinking automatic chord recognition with convolutional neural networks. In: 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 357–362. IEEE, New York (2012)

    Google Scholar 

  68. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 448–456 (2015)

    Google Scholar 

  69. Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 30(2), 269–283 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  70. Jaitly, N., Hinton, G.E.: Vocal tract length perturbation (VTLP) improves speech recognition. In: Proceedings of ICML Workshop on Deep Learning for Audio, Speech and Language (2013)

    Google Scholar 

  71. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv preprint arXiv:1408.5093

    Google Scholar 

  72. Józefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, 6–11 July 2015, pp. 2342–2350 (2015). http://jmlr.org/proceedings/papers/v37/jozefowicz15.html

  73. Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to speech recognition. Computational Linguistics and Natural Language Processing. Prentice Hall, Upper Saddle River (2008)

    Google Scholar 

  74. Kearns, M.J.: The Computational Complexity of Machine Learning. MIT Press, Cambridge (1990)

    Google Scholar 

  75. Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980

    Google Scholar 

  76. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: Proceedings of INTERSPEECH (2015)

    Google Scholar 

  77. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  78. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, ICML, vol. 1, pp. 282–289 (2001)

    Google Scholar 

  79. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  80. Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1096–1104 (2009)

    Google Scholar 

  81. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML Workshop on Deep Learning for Audio, Speech, and Language Processing (2013)

    Google Scholar 

  82. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, vol. 1, no. 14, pp. 281–297 (1967)

    MathSciNet  MATH  Google Scholar 

  83. McFee, B., Humphrey, E.J., Bello, J.P.: A software framework for musical data augmentation. In: International Society for Music Information Retrieval Conference (ISMIR) (2015)

    Google Scholar 

  84. Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Proceedings of INTERSPEECH, pp. 3771–3775 (2013)

    Google Scholar 

  85. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  86. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

    Google Scholar 

  87. Neal, R.M.: Probabilistic inference using Markov chain monte carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto, Ontario (1993)

    Google Scholar 

  88. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  89. Parascandolo, G., Huttunen, H., Virtanen, T.: Recurrent neural networks for polyphonic sound event detection in real life recordings. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444. IEEE, New York (2016)

    Google Scholar 

  90. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning, ICML (3), vol. 28, pp. 1310–1318 (2013)

    Google Scholar 

  91. Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)

    Article  MATH  Google Scholar 

  92. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  93. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE, New York (2015)

    Google Scholar 

  94. Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)

    Google Scholar 

  95. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  96. Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)

    Article  Google Scholar 

  97. Raiffa, H.: Bayesian decision theory. Recent Developments in Information and Decision Processes, pp. 92–101. Macmillan, New York (1962)

    Google Scholar 

  98. Rasmussen, C.E.: The infinite Gaussian mixture model. In: Neural Information Processing Systems, vol. 12, pp. 554–560 (1999)

    Google Scholar 

  99. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  100. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)

    MATH  Google Scholar 

  101. Rumelhart, D.E., McClelland, J.L., Group, P.R., et al.: Parallel Distributed Processing, vol. 1. IEEE, New York (1988)

    Google Scholar 

  102. Schlüter, J., Grill, T.: Exploring data augmentation for improved singing voice detection with neural networks. In: 16th International Society for Music Information Retrieval Conference (ISMIR-2015) (2015)

    Google Scholar 

  103. Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: International Conference on Computational Learning Theory, pp. 416–426. Springer, London (2001)

    Google Scholar 

  104. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  105. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  106. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  MATH  Google Scholar 

  107. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  108. Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 927–939 (2016)

    Article  Google Scholar 

  109. Sjöberg, J., Ljung, L.: Overtraining, regularization and searching for a minimum, with application to neural networks. Int. J. Control. 62(6), 1391–1407 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  110. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

    Google Scholar 

  111. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: Proceedings of the International Conference on International Conference on Machine Learning, ICML (3), vol. 28, pp. 1139–1147 (2013)

    Google Scholar 

  112. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  113. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)

    Google Scholar 

  114. Tran, D., Kucukelbir, A., Dieng, A.B., Rudolph, M., Liang, D., Blei, D.M.: Edward: a library for probabilistic modeling, inference, and criticism (2016). arXiv preprint arXiv:1610.09787

    Google Scholar 

  115. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6(Sep), 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  116. Van den Oord, A., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: Advances in Neural Information Processing Systems, pp. 2643–2651 (2013)

    Google Scholar 

  117. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  118. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  119. Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11(Dec), 3571–3594 (2010)

    MathSciNet  MATH  Google Scholar 

  120. Werbos, P.J.: Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1(4), 339–356 (1988)

    Article  Google Scholar 

  121. Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)

    Article  Google Scholar 

  122. Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699. ACM, New York (2002)

    Google Scholar 

  123. Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., Chen, Y.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–26 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian McFee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

McFee, B. (2018). Statistical Methods for Scene and Event Classification. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63450-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63449-4

  • Online ISBN: 978-3-319-63450-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics