Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

Ruiz-Muñoz, J. F.; You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.

doi:10.1007/s11265-016-1155-0

Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

Published: 30 June 2016

Volume 90, pages 233–247, (2018)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

J. F. Ruiz-Muñoz¹,
Zeyu You²,
Raviv Raich² &
…
Xiaoli Z. Fern²

508 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

This paper deals with the application of the convolutive version of dictionary learning to analyze in-situ audio recordings for bio-acoustics monitoring. We propose an efficient approach for learning and using a sparse convolutive model to represent a collection of spectrograms. In this approach, we identify repeated bioacoustics patterns, e.g., bird syllables, as words and represent new spectrograms using these words. Moreover, we propose a supervised dictionary learning approach in the multiple-label setting to support multi-label classification of unlabeled spectrograms. Our approach relies on a random projection for reduced computational complexity. As a consequence, the non-negativity requirement on the dictionary words is relaxed. Furthermore, the proposed approach is well-suited for a collection of discontinuous spectrograms. We evaluate our approach on synthetic examples and on two real datasets consisting of multiple birds audio recordings. Bird syllable dictionary learning from a real-world dataset is demonstrated. Additionally, we successfully apply the approach to spectrogram denoising and species classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings

Article Open access 03 July 2017

Tzu-Hao Lin, Shih-Hua Fang & Yu Tsao

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

Multiple-Instance Multiple-Label Learning for the Classification of Frog Calls with Acoustic Event Detection

Notes

https://www.kaggle.com/c/mlsp-2013-birds

References

Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., & Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning : a comparison of methods. Ecological Informatics, 4(4), 206–214.
Article Google Scholar
Badeau, R., & Plumbley, M. (2013). Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain. Transactions on Audio. Speech and Language Processing, 22 (11), 1670–1680.
Google Scholar
Baraniuk, R. (2007). Compressive sensing. IEEE signal processing magazine, 24(4).
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Article Google Scholar
Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J. K., Hadley, A. S., & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131(6), 4640–4650.
Article Google Scholar
Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song - Biological Themes and Variations. Cambridge University Press.
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., & et al. (2004). Least angle regression. The Annals of statistics, 32(2), 407–499.
Article MathSciNet MATH Google Scholar
Fagerlund, S. (2007). Bird species recognition using support vector machines. EURASIP J Appl Signal Process, 2007(1), 64–64.
MATH Google Scholar
Gangeh, M. J., Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2015). Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928, 1502.05928.
Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In 2003 IEEE International conference on acoustics, speech, and signal processing (ICASSP), IEEE, (Vol. 5 pp. 545–548).
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.
Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310.
Article Google Scholar
Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.
Article MathSciNet Google Scholar
Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5, 1025–1031.
Article Google Scholar
Lee, C. H., Han, C. C., & Chuang, C. C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio Speech, and Language Processing, 16(8), 1541–1550.
Article Google Scholar
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., Tresp, V.(eds) Advances in neural information processing systems 13, MIT press (pp. 556–562).
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).
Liu, Q., Wang, W., Jackson, P. J. B., Barnard, M., Kittler, J., & Chambers, J. (2013). Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking. IEEE Transactions on Signal Processing, 61(22), 5520–5535.
Article MathSciNet Google Scholar
Mellinger, D. K., & Clark, C. W. (2000). Recognizing transient low-frequency whale sounds by spectrogram correlation. The Journal of the Acoustical Society of America, 107(6), 3518– 3529.
Article Google Scholar
O’Grady, P. D., & Pearlmutter, B. A. (2006). Convolutive non-negative matrix factorisation with a sparseness constraint. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2006), Maynooth, Ireland (pp. 427–432).
O’Grady, P. D., & Pearlmutter, B. A. (2008). Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing, 72(1-3), 88– 101.
Article Google Scholar
Owren, M. J., & Bernacki, R. H. (1998). Animal acoustic communication: Sound analysis and research methods.
Romberg, J. (2008). Imaging via compressive sampling [introduction to compressive sampling and recovery via convex programming]. IEEE Signal Processing Magazine, 25(2), 14– 20.
Article Google Scholar
Ruiz-Munoz, J., You, Z., Raich, R., & Fern, X. Z. (2015). Dictionary extraction from a collection of spectrograms for bioacoustics monitoring. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, IEEE (pp. 1–6).
Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Independent Component Analysis and Blind Signal Separation (pp. 494–499): Springer.
Somervuo, P., Härmä, A., & Fagerlund, S. (2006). Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio Speech, and Language Processing, 14(6), 2252–2263.
Article Google Scholar
Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2.
Strang, G. (1986). A proposal for toeplitz matrix calculations. Studies in Applied Mathematics, 74(2), 171–176.
Article MATH Google Scholar
Tax, D. M. J., & Duin, R. P. W. (2008). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. In Proceedings, Springer Berlin Heidelberg, chap Learning Curves for the Analysis of Multiple Instance Classifiers (pp. 724–733).
Trifa, V. M., Kirschel, A. N., Taylor, C. E., & Vallejo, E. E. (2008). Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. The Journal of the Acoustical Society of America, 123(4), 2424–31.
Article Google Scholar
Wang, D., Vipperla, R., & Evans, N. W. D. (2011). Online pattern learning for non-negative convolutive sparse coding accepted. In INTERSPEECH 2011, 12Th annual conference of the international speech communication, August 28-31, Florence, Italy.
Wang, F., & Li, P. (2010). Efficient Nonnegative Matrix Factorization with Random Projections, chap 24, pp 281–292.
Wang, Y. X., & Zhang, Y. J. (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353.
Article Google Scholar
Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transactions on Signal Processing, 57(6), 2178–2191.
Article MathSciNet Google Scholar
Yeh, C. C. M., & Yang, Y. H. (2012). Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA, ICMR ’12 (pp. 55:1–55:8).
Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.
Article Google Scholar

Download references

Acknowledgments

This work is partially supported by the National Science Foundation grants CCF-1254218, DBI-1356792, IIS-1055113, and the Colciencias’ Doctoral Training Support Programme. The authors thank the anonymous reviewers for the valuable comments and suggestions.

Author information

Authors and Affiliations

Universidad Nacional de Colombia, Manizales, 170004, Colombia
J. F. Ruiz-Muñoz
School of EECS, Oregon State University, Corvallis, OR, 97331-5501, USA
Zeyu You, Raviv Raich & Xiaoli Z. Fern

Authors

J. F. Ruiz-Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu You
View author publications
You can also search for this author in PubMed Google Scholar
Raviv Raich
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Z. Fern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. F. Ruiz-Muñoz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruiz-Muñoz, J.F., You, Z., Raich, R. et al. Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification. J Sign Process Syst 90, 233–247 (2018). https://doi.org/10.1007/s11265-016-1155-0

Download citation

Received: 01 January 2016
Revised: 07 May 2016
Accepted: 16 June 2016
Published: 30 June 2016
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11265-016-1155-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

Abstract

Access this article

Similar content being viewed by others

Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

Multiple-Instance Multiple-Label Learning for the Classification of Frog Calls with Acoustic Event Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification

Abstract

Access this article

Similar content being viewed by others

Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings

An Optimised Grid Search Based Framework for Robust Large-Scale Natural Soundscape Classification

Multiple-Instance Multiple-Label Learning for the Classification of Frog Calls with Acoustic Event Detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation