Abstract
This paper deals with the application of the convolutive version of dictionary learning to analyze in-situ audio recordings for bio-acoustics monitoring. We propose an efficient approach for learning and using a sparse convolutive model to represent a collection of spectrograms. In this approach, we identify repeated bioacoustics patterns, e.g., bird syllables, as words and represent new spectrograms using these words. Moreover, we propose a supervised dictionary learning approach in the multiple-label setting to support multi-label classification of unlabeled spectrograms. Our approach relies on a random projection for reduced computational complexity. As a consequence, the non-negativity requirement on the dictionary words is relaxed. Furthermore, the proposed approach is well-suited for a collection of discontinuous spectrograms. We evaluate our approach on synthetic examples and on two real datasets consisting of multiple birds audio recordings. Bird syllable dictionary learning from a real-world dataset is demonstrated. Additionally, we successfully apply the approach to spectrogram denoising and species classification.
Similar content being viewed by others
References
Acevedo, M. A., Corrada-Bravo, C. J., Corrada-Bravo, H., Villanueva-Rivera, L. J., & Aide, T. M. (2009). Automated classification of bird and amphibian calls using machine learning : a comparison of methods. Ecological Informatics, 4(4), 206–214.
Badeau, R., & Plumbley, M. (2013). Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain. Transactions on Audio. Speech and Language Processing, 22 (11), 1670–1680.
Baraniuk, R. (2007). Compressive sensing. IEEE signal processing magazine, 24(4).
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J. K., Hadley, A. S., & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131(6), 4640–4650.
Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song - Biological Themes and Variations. Cambridge University Press.
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., & et al. (2004). Least angle regression. The Annals of statistics, 32(2), 407–499.
Fagerlund, S. (2007). Bird species recognition using support vector machines. EURASIP J Appl Signal Process, 2007(1), 64–64.
Gangeh, M. J., Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2015). Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928, 1502.05928.
Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In 2003 IEEE International conference on acoustics, speech, and signal processing (ICASSP), IEEE, (Vol. 5 pp. 545–548).
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.
Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299–310.
Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.
Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5, 1025–1031.
Lee, C. H., Han, C. C., & Chuang, C. C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio Speech, and Language Processing, 16(8), 1541–1550.
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., Tresp, V.(eds) Advances in neural information processing systems 13, MIT press (pp. 556–562).
Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).
Liu, Q., Wang, W., Jackson, P. J. B., Barnard, M., Kittler, J., & Chambers, J. (2013). Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking. IEEE Transactions on Signal Processing, 61(22), 5520–5535.
Mellinger, D. K., & Clark, C. W. (2000). Recognizing transient low-frequency whale sounds by spectrogram correlation. The Journal of the Acoustical Society of America, 107(6), 3518– 3529.
O’Grady, P. D., & Pearlmutter, B. A. (2006). Convolutive non-negative matrix factorisation with a sparseness constraint. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2006), Maynooth, Ireland (pp. 427–432).
O’Grady, P. D., & Pearlmutter, B. A. (2008). Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing, 72(1-3), 88– 101.
Owren, M. J., & Bernacki, R. H. (1998). Animal acoustic communication: Sound analysis and research methods.
Romberg, J. (2008). Imaging via compressive sampling [introduction to compressive sampling and recovery via convex programming]. IEEE Signal Processing Magazine, 25(2), 14– 20.
Ruiz-Munoz, J., You, Z., Raich, R., & Fern, X. Z. (2015). Dictionary extraction from a collection of spectrograms for bioacoustics monitoring. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, IEEE (pp. 1–6).
Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Independent Component Analysis and Blind Signal Separation (pp. 494–499): Springer.
Somervuo, P., Härmä, A., & Fagerlund, S. (2006). Parametric representations of bird sounds for automatic species recognition. IEEE Transactions on Audio Speech, and Language Processing, 14(6), 2252–2263.
Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2.
Strang, G. (1986). A proposal for toeplitz matrix calculations. Studies in Applied Mathematics, 74(2), 171–176.
Tax, D. M. J., & Duin, R. P. W. (2008). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. In Proceedings, Springer Berlin Heidelberg, chap Learning Curves for the Analysis of Multiple Instance Classifiers (pp. 724–733).
Trifa, V. M., Kirschel, A. N., Taylor, C. E., & Vallejo, E. E. (2008). Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. The Journal of the Acoustical Society of America, 123(4), 2424–31.
Wang, D., Vipperla, R., & Evans, N. W. D. (2011). Online pattern learning for non-negative convolutive sparse coding accepted. In INTERSPEECH 2011, 12Th annual conference of the international speech communication, August 28-31, Florence, Italy.
Wang, F., & Li, P. (2010). Efficient Nonnegative Matrix Factorization with Random Projections, chap 24, pp 281–292.
Wang, Y. X., & Zhang, Y. J. (2013). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353.
Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transactions on Signal Processing, 57(6), 2178–2191.
Yeh, C. C. M., & Yang, Y. H. (2012). Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA, ICMR ’12 (pp. 55:1–55:8).
Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.
Acknowledgments
This work is partially supported by the National Science Foundation grants CCF-1254218, DBI-1356792, IIS-1055113, and the Colciencias’ Doctoral Training Support Programme. The authors thank the anonymous reviewers for the valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ruiz-Muñoz, J.F., You, Z., Raich, R. et al. Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification. J Sign Process Syst 90, 233–247 (2018). https://doi.org/10.1007/s11265-016-1155-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-016-1155-0