International Conference of the Cross-Language Evaluation Forum for European Languages

Experimental IR Meets Multilinguality, Multimodality, and Interaction pp 261-267 | Cite as

Automatic Segmentation and Deep Learning of Bird Sounds

  • Hendrik Vincent Koops
  • Jan van Balen
  • Frans Wiering
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9283)


We present a study on automatic birdsong recognition with deep neural networks using the birdclef2014 dataset. Through deep learning, feature hierarchies are learned that represent the data on several levels of abstraction. Deep learning has been applied with success to problems in fields such as music information retrieval and image recognition, but its use in bioacoustics is rare. Therefore, we investigate the application of a common deep learning technique (deep neural networks) in a classification task using songs from Amazonian birds. We show that various deep neural networks are capable of outperforming other classification methods. Furthermore, we present an automatic segmentation algorithm that is capable of separating bird sounds from non-bird sounds.


Deep learning Feature learning Bioacoustics Segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012)Google Scholar
  2. 2.
    Harma, A.: Automatic identification of bird species based on sinusoidal modeling of syllables. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 5 (2013)Google Scholar
  3. 3.
    Somervuo, P., Harma, A.: Bird song recognition based on syllable pair histograms. In: 2004 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 5, p. V–825. IEEE (2004)Google Scholar
  4. 4.
    Goëau, H., Glotin, H., Vellinga, W.-P., Rauber, A.: LifeCLEF bird identification task 2014. In: CLEF Working Notes 2014 (2014)Google Scholar
  5. 5.
    Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), pp. 339–344. Utrecht, The Netherlands (2010)Google Scholar
  6. 6.
    Schmidt, E., Scott, J., Kim, Y.: Feature learning in dynamic environments: modeling the acoustic structure of musical emotion. In: Proceedings of the 13th International Society for Music Information Retrieval Conference, Porto, Portugal, October 8–12, 2012Google Scholar
  7. 7.
    Deng, L., Yu, D.: Deep learning: Methods and applications. Technical Report MSR-TR-2014-21, Microsoft, January 2014Google Scholar
  8. 8.
    Lee, C.-H., Lee, Y.-K., Huang, R.-Z.: Automatic recognition of bird songs using cepstral coefficients. Journal of Information Technology and Applications 1(1), 17–23 (2006)Google Scholar
  9. 9.
    Chou, C.-H., Ko, H.-Y.: Automatic birdsong recognition with MFCC based syllable feature extraction. In: Hsu, C.-H., Yang, L.T., Ma, J., Zhu, C. (eds.) Ubiquitous Intelligence and Computing. LNCS, vol. 6905, pp. 185–196. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Koops, H.V.: A Deep Neural Network Approach to Automatic Birdsong Recognition. Master’s Thesis, Utrecht University (2014)Google Scholar
  11. 11.
    Bengio, Y.: Learning deep architectures for AI. In: Foundations and trends\(\textregistered \) in Machine Learning, pp. 1–127. Now Publishers Inc. (2009)Google Scholar
  12. 12.
    Koops, H.V., Van Balen, J., Wiering, F.: A deep neural network approach to the LifeCLEF 2014 bird task. In: CLEF 2014 Working Notes, vol. 1180, pp. 634–642 (2014)Google Scholar
  13. 13.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006). MIT PressMathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems 19, pp. 153–160. MIT Press (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hendrik Vincent Koops
    • 1
  • Jan van Balen
    • 1
  • Frans Wiering
    • 1
  1. 1.Department of Information and Computing SciencesUtrecht UniversityUtrechtThe Netherlands

Personalised recommendations