Learning Higher Representations from Bioacoustics: A Sequence-to-Sequence Deep Learning Approach for Bird Sound Classification

Qiao, Yu; Qian, Kun; Zhao, Ziping

doi:10.1007/978-3-030-63823-8_16

Yu Qiao¹¹,
Kun Qian¹² &
Ziping Zhao¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

International Conference on Neural Information Processing

2212 Accesses
4 Citations

Abstract

In the past two decades, a plethora of efforts have been given to the field of automatic classification of bird sounds, which can facilitate a long-term, non-human, and low-energy consumption ubiquitous computing system for monitoring the nature reserve. Nevertheless, human hand-crafted features need numerous domain knowledge, and inevitably make the designing progress time-consuming and expensive. To this line, we propose a sequence-to-sequence deep learning approach for extracting the higher representations automatically from bird sounds without any human expert knowledge. First, we transform the birds sound audio into spectrograms. Subsequently, higher representations were learnt by an autoencoder-based encoder-decoder paradigm combined with the deep recurrent neural networks. Finally, two typical machine learning models are selected to predict the classes, i.e., support vector machines and multi-layer perceptrons. Experimental results demonstrate the effectiveness of the method proposed, which can reach an unweighted average recall (UAR) at 66.8% in recognising 86 species of birds.

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61702370), P. R. China, the Key Program of the Natural Science Foundation of Tianjin (Grant No. 18JCZDJC36300), P. R. China, the Open Projects Program of the National Laboratory of Pattern Recognition, P. R. China, the Zhejiang Lab’s International Talent Fund for Young Professionals (Project HANAMI), P. R. China, the JSPS Postdoctoral Fellowship for Research in Japan (ID No. P19081) from the Japan Society for the Promotion of Science (JSPS), Japan, and the Grants-in-Aid for Scientific Research (No. 19F19081) from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.animalsoundarchive.org/RefSys/Statistics.php.

References

Qian, K., Zhang, Z., Ringeval, F., Schuller, B.: Bird sounds classification by large scale acoustic features and extreme learning machine. In: Proceedings of GlobalSIP, Orlando, Florida, USA, pp. 1317–1321. IEEE (2015)
Google Scholar
Qian, K., Guo, J., Ishida, K., Matsuoka, S.: Fast recognition of bird sounds using extreme learning machines. IEEE Trans. Electr. Electron. Eng. 12(2), 294–296 (2017)
Article Google Scholar
Papadopoulos, T., Roberts, S.J., Willis, K.J.: Automated bird sound recognition in realistic settings (2018)
Google Scholar
Kaewtip, K.: Robust automatic recognition of birdsongs and human speech: a template-based approach. Ph.D. thesis, UCLA (2017)
Google Scholar
Bang, A.V., Rege, P.P.: Evaluation of various feature sets and feature selection towards automatic recognition of bird species. Int. J. Comput. Appl. Technol. 56(3), 172–184 (2017)
Article Google Scholar
Piczak, K.J.: Recognizing bird species in audio recordings using deep convolutional neural networks. In: Proceedings of International Conference on Genetic & Evolutionary Computing, Fujian, China, pp. 534–543. IEEE (2016)
Google Scholar
Xie, J., Hu, K., Zhu, M., Yu, J., Zhu, Q.: Investigation of different CNN-based models for improved bird sound classification. IEEE Access 7(8922774), 175353–175361 (2019)
Article Google Scholar
Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. In: Proceedings of Interspeech, Graz, Austria, pp. 1–5. ISCA (2019)
Google Scholar
Okamoto, T., Toda, T., Shiga, Y., Kawai, H.: Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders. In: Proceedings of Interspeech, Graz, Austria, pp. 1308–1312. ISCA (2019)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of EMNLP, Doha, Qatar, pp. 1724–1734. Association for Computational Linguistics (2014)
Google Scholar
Cho, K., Van Merrinboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of SSST-8, Doha, Qatar, pp. 103–111 Association for Computational Linguistics (2014)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 3104–3112. MIT Press (2014)
Google Scholar
Qian, K.: Automatic general audio signal classification. Ph.D. thesis, Munich, Germany (2018). Doctoral thesis
Google Scholar
Amiriparian, S., Freitag, M., Cummins, N., Schuller, B.: Sequence to sequence autoencoders for unsupervised representation learning from audio. In: Proceedings of the DCASE 2017 Workshop, Munich, Germany, pp. 17–21. IEEE (2017)
Google Scholar
Freitag, M., Amiriparian, S., Pugachevskiy, S., Cummins, N., Schuller, B.: auDeep: unsupervised learning of representations from audio with deep recurrent neural networks. J. Mach. Learn. Res. 18(1), 6340–6344 (2017)
MathSciNet MATH Google Scholar
Deng, Y., Wang, L., Jia, H., Tong, X., Li, F.: A sequence-to-sequence deep learning architecture based on bidirectional GRU for type recognition and time location of combined power quality disturbance. IEEE Trans. Industr. Inf. 15(8), 4481–4493 (2019)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China
Yu Qiao & Ziping Zhao
Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan
Kun Qian

Authors

Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Kun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Ziping Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kun Qian or Ziping Zhao .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, Y., Qian, K., Zhao, Z. (2020). Learning Higher Representations from Bioacoustics: A Sequence-to-Sequence Deep Learning Approach for Bird Sound Classification. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-63823-8_16
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63822-1
Online ISBN: 978-3-030-63823-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics