Skip to main content
Log in

Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper deals with the application of the convolutive version of dictionary learning to analyze in-situ audio recordings for bio-acoustics monitoring. We propose an efficient approach for learning and using a sparse convolutive model to represent a collection of spectrograms. In this approach, we identify repeated bioacoustics patterns, e.g., bird syllables, as words and represent new spectrograms using these words. Moreover, we propose a supervised dictionary learning approach in the multiple-label setting to support multi-label classification of unlabeled spectrograms. Our approach relies on a random projection for reduced computational complexity. As a consequence, the non-negativity requirement on the dictionary words is relaxed. Furthermore, the proposed approach is well-suited for a collection of discontinuous spectrograms. We evaluate our approach on synthetic examples and on two real datasets consisting of multiple birds audio recordings. Bird syllable dictionary learning from a real-world dataset is demonstrated. Additionally, we successfully apply the approach to spectrogram denoising and species classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/c/mlsp-2013-birds

References

  1. Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., & Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning : a comparison of methods. Ecological Informatics, 4(4), 206–214.

    Article  Google Scholar 

  2. Badeau, R., & Plumbley, M. (2013). Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain. Transactions on Audio. Speech and Language Processing, 22 (11), 1670–1680.

    Google Scholar 

  3. Baraniuk, R. (2007). Compressive sensing. IEEE signal processing magazine, 24(4).

  4. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

    Article  Google Scholar 

  5. Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J. K., Hadley, A. S., & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131(6), 4640–4650.

    Article  Google Scholar 

  6. Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song - Biological Themes and Variations. Cambridge University Press.

  7. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., & et al. (2004). Least angle regression. The Annals of statistics, 32(2), 407–499.

    Article  MathSciNet  MATH  Google Scholar 

  8. Fagerlund, S. (2007). Bird species recognition using support vector machines. EURASIP J Appl Signal Process, 2007(1), 64–64.

    MATH  Google Scholar 

  9. Gangeh, M. J., Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2015). Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928, 1502.05928.

  10. Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In 2003 IEEE International conference on acoustics, speech, and signal processing (ICASSP), IEEE, (Vol. 5 pp. 545–548).

  11. Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.

  12. Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310.

    Article  Google Scholar 

  13. Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.

    Article  MathSciNet  Google Scholar 

  14. Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5, 1025–1031.

    Article  Google Scholar 

  15. Lee, C. H., Han, C. C., & Chuang, C. C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio Speech, and Language Processing, 16(8), 1541–1550.

    Article  Google Scholar 

  16. Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., Tresp, V.(eds) Advances in neural information processing systems 13, MIT press (pp. 556–562).

  17. Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).

  18. Liu, Q., Wang, W., Jackson, P. J. B., Barnard, M., Kittler, J., & Chambers, J. (2013). Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking. IEEE Transactions on Signal Processing, 61(22), 5520–5535.

    Article  MathSciNet  Google Scholar 

  19. Mellinger, D. K., & Clark, C. W. (2000). Recognizing transient low-frequency whale sounds by spectrogram correlation. The Journal of the Acoustical Society of America, 107(6), 3518– 3529.

    Article  Google Scholar 

  20. O’Grady, P. D., & Pearlmutter, B. A. (2006). Convolutive non-negative matrix factorisation with a sparseness constraint. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2006), Maynooth, Ireland (pp. 427–432).

  21. O’Grady, P. D., & Pearlmutter, B. A. (2008). Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing, 72(1-3), 88– 101.

    Article  Google Scholar 

  22. Owren, M. J., & Bernacki, R. H. (1998). Animal acoustic communication: Sound analysis and research methods.

  23. Romberg, J. (2008). Imaging via compressive sampling [introduction to compressive sampling and recovery via convex programming]. IEEE Signal Processing Magazine, 25(2), 14– 20.

    Article  Google Scholar 

  24. Ruiz-Munoz, J., You, Z., Raich, R., & Fern, X. Z. (2015). Dictionary extraction from a collection of spectrograms for bioacoustics monitoring. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, IEEE (pp. 1–6).

  25. Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Independent Component Analysis and Blind Signal Separation (pp. 494–499): Springer.

  26. Somervuo, P., Härmä, A., & Fagerlund, S. (2006). Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio Speech, and Language Processing, 14(6), 2252–2263.

    Article  Google Scholar 

  27. Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2.

  28. Strang, G. (1986). A proposal for toeplitz matrix calculations. Studies in Applied Mathematics, 74(2), 171–176.

    Article  MATH  Google Scholar 

  29. Tax, D. M. J., & Duin, R. P. W. (2008). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. In Proceedings, Springer Berlin Heidelberg, chap Learning Curves for the Analysis of Multiple Instance Classifiers (pp. 724–733).

  30. Trifa, V. M., Kirschel, A. N., Taylor, C. E., & Vallejo, E. E. (2008). Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. The Journal of the Acoustical Society of America, 123(4), 2424–31.

    Article  Google Scholar 

  31. Wang, D., Vipperla, R., & Evans, N. W. D. (2011). Online pattern learning for non-negative convolutive sparse coding accepted. In INTERSPEECH 2011, 12Th annual conference of the international speech communication, August 28-31, Florence, Italy.

  32. Wang, F., & Li, P. (2010). Efficient Nonnegative Matrix Factorization with Random Projections, chap 24, pp 281–292.

  33. Wang, Y. X., & Zhang, Y. J. (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353.

    Article  Google Scholar 

  34. Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transactions on Signal Processing, 57(6), 2178–2191.

    Article  MathSciNet  Google Scholar 

  35. Yeh, C. C. M., & Yang, Y. H. (2012). Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA, ICMR ’12 (pp. 55:1–55:8).

  36. Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the National Science Foundation grants CCF-1254218, DBI-1356792, IIS-1055113, and the Colciencias’ Doctoral Training Support Programme. The authors thank the anonymous reviewers for the valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. F. Ruiz-Muñoz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruiz-Muñoz, J.F., You, Z., Raich, R. et al. Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification. J Sign Process Syst 90, 233–247 (2018). https://doi.org/10.1007/s11265-016-1155-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-016-1155-0

Keywords

Navigation