Dictionary Learning for Bioacoustics Monitoring with Applications to Species Classification
- 226 Downloads
This paper deals with the application of the convolutive version of dictionary learning to analyze in-situ audio recordings for bio-acoustics monitoring. We propose an efficient approach for learning and using a sparse convolutive model to represent a collection of spectrograms. In this approach, we identify repeated bioacoustics patterns, e.g., bird syllables, as words and represent new spectrograms using these words. Moreover, we propose a supervised dictionary learning approach in the multiple-label setting to support multi-label classification of unlabeled spectrograms. Our approach relies on a random projection for reduced computational complexity. As a consequence, the non-negativity requirement on the dictionary words is relaxed. Furthermore, the proposed approach is well-suited for a collection of discontinuous spectrograms. We evaluate our approach on synthetic examples and on two real datasets consisting of multiple birds audio recordings. Bird syllable dictionary learning from a real-world dataset is demonstrated. Additionally, we successfully apply the approach to spectrogram denoising and species classification.
KeywordsDictionary learning Random matrix projection Classification
This work is partially supported by the National Science Foundation grants CCF-1254218, DBI-1356792, IIS-1055113, and the Colciencias’ Doctoral Training Support Programme. The authors thank the anonymous reviewers for the valuable comments and suggestions.
- 2.Badeau, R., & Plumbley, M. (2013). Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain. Transactions on Audio. Speech and Language Processing, 22 (11), 1670–1680.Google Scholar
- 3.Baraniuk, R. (2007). Compressive sensing. IEEE signal processing magazine, 24(4).Google Scholar
- 5.Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X. Z., Raich, R., Hadley, S. J. K., Hadley, A. S., & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America, 131(6), 4640–4650.CrossRefGoogle Scholar
- 6.Catchpole, C. K., & Slater, P. J. B. (1995). Bird Song - Biological Themes and Variations. Cambridge University Press.Google Scholar
- 9.Gangeh, M. J., Farahat, A. K., Ghodsi, A., & Kamel, M. S. (2015). Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928, 1502.05928.
- 10.Härmä, A. (2003). Automatic identification of bird species based on sinusoidal modeling of syllables. In 2003 IEEE International conference on acoustics, speech, and signal processing (ICASSP), IEEE, (Vol. 5 pp. 545–548).Google Scholar
- 11.Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.Google Scholar
- 16.Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Leen, T. K., Dietterich, T. G., Tresp, V.(eds) Advances in neural information processing systems 13, MIT press (pp. 556–562).Google Scholar
- 17.Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).Google Scholar
- 18.Liu, Q., Wang, W., Jackson, P. J. B., Barnard, M., Kittler, J., & Chambers, J. (2013). Source separation of convolutive and noisy mixtures using audio-visual dictionary learning and probabilistic time-frequency masking. IEEE Transactions on Signal Processing, 61(22), 5520–5535.MathSciNetCrossRefGoogle Scholar
- 20.O’Grady, P. D., & Pearlmutter, B. A. (2006). Convolutive non-negative matrix factorisation with a sparseness constraint. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2006), Maynooth, Ireland (pp. 427–432).Google Scholar
- 22.Owren, M. J., & Bernacki, R. H. (1998). Animal acoustic communication: Sound analysis and research methods.Google Scholar
- 24.Ruiz-Munoz, J., You, Z., Raich, R., & Fern, X. Z. (2015). Dictionary extraction from a collection of spectrograms for bioacoustics monitoring. In Machine Learning for Signal Processing (MLSP), 2015 IEEE 25th International Workshop on, IEEE (pp. 1–6).Google Scholar
- 25.Smaragdis, P. (2004). Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs. In Independent Component Analysis and Blind Signal Separation (pp. 494–499): Springer.Google Scholar
- 27.Stowell, D., & Plumbley, M. D. (2014). Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2.Google Scholar
- 29.Tax, D. M. J., & Duin, R. P. W. (2008). Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. In Proceedings, Springer Berlin Heidelberg, chap Learning Curves for the Analysis of Multiple Instance Classifiers (pp. 724–733).Google Scholar
- 31.Wang, D., Vipperla, R., & Evans, N. W. D. (2011). Online pattern learning for non-negative convolutive sparse coding accepted. In INTERSPEECH 2011, 12Th annual conference of the international speech communication, August 28-31, Florence, Italy.Google Scholar
- 32.Wang, F., & Li, P. (2010). Efficient Nonnegative Matrix Factorization with Random Projections, chap 24, pp 281–292.Google Scholar
- 35.Yeh, C. C. M., & Yang, Y. H. (2012). Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ACM, New York, NY, USA, ICMR ’12 (pp. 55:1–55:8).Google Scholar