Journal of Signal Processing Systems

, Volume 90, Issue 2, pp 233–247 | Cite as

Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

  • J. F. Ruiz-Muñoz
  • Zeyu You
  • Raviv Raich
  • Xiaoli Z. Fern


This paper deals with the application of the convolutive version of dictionary learning to analyze in-situ audio recordings for bio-acoustics monitoring. We propose an efficient approach for learning and using a sparse convolutive model to represent a collection of spectrograms. In this approach, we identify repeated bioacoustics patterns, e.g., bird syllables, as words and represent new spectrograms using these words. Moreover, we propose a supervised dictionary learning approach in the multiple-label setting to support multi-label classification of unlabeled spectrograms. Our approach relies on a random projection for reduced computational complexity. As a consequence, the non-negativity requirement on the dictionary words is relaxed. Furthermore, the proposed approach is well-suited for a collection of discontinuous spectrograms. We evaluate our approach on synthetic examples and on two real datasets consisting of multiple birds audio recordings. Bird syllable dictionary learning from a real-world dataset is demonstrated. Additionally, we successfully apply the approach to spectrogram denoising and species classification.


Dictionary learning Random matrix projection Classification 



This work is partially supported by the National Science Foundation grants CCF-1254218, DBI-1356792, IIS-1055113, and the Colciencias’ Doctoral Training Support Programme. The authors thank the anonymous reviewers for the valuable comments and suggestions.


  1. 1.
    Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., & Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning : a comparison of methods. Ecological Informatics, 4(4), 206–214.CrossRefGoogle Scholar
  2. 2.
    Badeau, R., & Plumbley, M. (2013). Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain. Transactions on Audio. Speech and Language Processing, 22 (11), 1670–1680.Google Scholar
  3. 3.
    Baraniuk, R. (2007). Compressive sensing. IEEE signal processing magazine, 24(4).Google Scholar
  4. 4.
    Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRefGoogle Scholar
  5. 5.
    Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J. K., Hadley, A. S., & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131(6), 4640–4650.CrossRefGoogle Scholar
  6. 6.
    Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song - Biological Themes and Variations. Cambridge University Press.Google Scholar
  7. 7.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., & et al. (2004). Least angle regression. The Annals of statistics, 32(2), 407–499.MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Fagerlund, S. (2007). Bird species recognition using support vector machines. EURASIP J Appl Signal Process, 2007(1), 64–64.zbMATHGoogle Scholar
  9. 9.
    Gangeh, M. J., Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2015). Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928, 1502.05928.
  10. 10.
    Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In 2003 IEEE International conference on acoustics, speech, and signal processing (ICASSP), IEEE, (Vol. 5 pp. 545–548).Google Scholar
  11. 11.
    Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.Google Scholar
  12. 12.
    Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310.CrossRefGoogle Scholar
  13. 13.
    Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.MathSciNetCrossRefGoogle Scholar
  14. 14.
    Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5, 1025–1031.CrossRefGoogle Scholar
  15. 15.
    Lee, C. H., Han, C. C., & Chuang, C. C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio Speech, and Language Processing, 16(8), 1541–1550.CrossRefGoogle Scholar
  16. 16.
    Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., Tresp, V.(eds) Advances in neural information processing systems 13, MIT press (pp. 556–562).Google Scholar
  17. 17.
    Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).Google Scholar
  18. 18.
    Liu, Q., Wang, W., Jackson, P. J. B., Barnard, M., Kittler, J., & Chambers, J. (2013). Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking. IEEE Transactions on Signal Processing, 61(22), 5520–5535.MathSciNetCrossRefGoogle Scholar
  19. 19.
    Mellinger, D. K., & Clark, C. W. (2000). Recognizing transient low-frequency whale sounds by spectrogram correlation. The Journal of the Acoustical Society of America, 107(6), 3518– 3529.CrossRefGoogle Scholar
  20. 20.
    O’Grady, P. D., & Pearlmutter, B. A. (2006). Convolutive non-negative matrix factorisation with a sparseness constraint. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2006), Maynooth, Ireland (pp. 427–432).Google Scholar
  21. 21.
    O’Grady, P. D., & Pearlmutter, B. A. (2008). Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing, 72(1-3), 88– 101.CrossRefGoogle Scholar
  22. 22.
    Owren, M. J., & Bernacki, R. H. (1998). Animal acoustic communication: Sound analysis and research methods.Google Scholar
  23. 23.
    Romberg, J. (2008). Imaging via compressive sampling [introduction to compressive sampling and recovery via convex programming]. IEEE Signal Processing Magazine, 25(2), 14– 20.CrossRefGoogle Scholar
  24. 24.
    Ruiz-Munoz, J., You, Z., Raich, R., & Fern, X. Z. (2015). Dictionary extraction from a collection of spectrograms for bioacoustics monitoring. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, IEEE (pp. 1–6).Google Scholar
  25. 25.
    Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Independent Component Analysis and Blind Signal Separation (pp. 494–499): Springer.Google Scholar
  26. 26.
    Somervuo, P., Härmä, A., & Fagerlund, S. (2006). Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio Speech, and Language Processing, 14(6), 2252–2263.CrossRefGoogle Scholar
  27. 27.
    Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2.Google Scholar
  28. 28.
    Strang, G. (1986). A proposal for toeplitz matrix calculations. Studies in Applied Mathematics, 74(2), 171–176.CrossRefzbMATHGoogle Scholar
  29. 29.
    Tax, D. M. J., & Duin, R. P. W. (2008). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. In Proceedings, Springer Berlin Heidelberg, chap Learning Curves for the Analysis of Multiple Instance Classifiers (pp. 724–733).Google Scholar
  30. 30.
    Trifa, V. M., Kirschel, A. N., Taylor, C. E., & Vallejo, E. E. (2008). Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. The Journal of the Acoustical Society of America, 123(4), 2424–31.CrossRefGoogle Scholar
  31. 31.
    Wang, D., Vipperla, R., & Evans, N. W. D. (2011). Online pattern learning for non-negative convolutive sparse coding accepted. In INTERSPEECH 2011, 12Th annual conference of the international speech communication, August 28-31, Florence, Italy.Google Scholar
  32. 32.
    Wang, F., & Li, P. (2010). Efficient Nonnegative Matrix Factorization with Random Projections, chap 24, pp 281–292.Google Scholar
  33. 33.
    Wang, Y. X., & Zhang, Y. J. (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353.CrossRefGoogle Scholar
  34. 34.
    Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transactions on Signal Processing, 57(6), 2178–2191.MathSciNetCrossRefGoogle Scholar
  35. 35.
    Yeh, C. C. M., & Yang, Y. H. (2012). Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA, ICMR ’12 (pp. 55:1–55:8).Google Scholar
  36. 36.
    Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • J. F. Ruiz-Muñoz
    • 1
  • Zeyu You
    • 2
  • Raviv Raich
    • 2
  • Xiaoli Z. Fern
    • 2
  1. 1.Universidad Nacional de ColombiaManizalesColombia
  2. 2.School of EECSOregon State UniversityCorvallisUSA

Personalised recommendations