Learnt dictionary based active learning method for environmental sound event tagging

  • Xiao Qin
  • Wanting JiEmail author
  • Ruili Wang
  • ChangAn Yuan


Sound event tagging is a process that adds texts or labels to sound segments based on their salient features and/or annotations. In the real world, since annotating cost is much expensive, tagged sound segments are limited, while untagged sound segments can be obtained easily and inexpensively. Thus, semi-automatic tagging becomes very important, which can assign labels to massive untagged sound segments according to a small number of manually annotated sound segments. Active learning is an effective technique to solve this problem, in which selected sound segments are manually tagged while other sound segments are automatically tagged. In this paper, a learnt dictionary based active learning method is proposed for environmental sound event tagging, which can significantly reduce the annotating cost in the process of semi-automatic tagging. The proposed method is based on a learnt dictionary, as dictionary learning is more adapt to sound feature extraction. Moreover, tagging accuracy and annotating cost are used to measure the performance of the proposed method. Experimental results demonstrate that the proposed method has higher tagging accuracy but requires much less annotating cost than other existing methods.


Internet of things Dictionary learning Sparse coding Active learning k-medoids clustering Sound event tagging 



This work is partially supported by the National Natural Science Foundation of Guangxi under Grant (2016GXNSFAA380209, 2014GXNSFDA118037), the Natural Science Foundation of Zhejiang Province (No. LY18F010008), the “BAGUI Scholar” Program of Guangxi Zhuang Autonomous Region of China, the project of Scientific Research and Technology Development (AB16380272, AA18118047) in Guangxi, and the project of Scientific Research and Technology Development (#20175177) in Guangxi Nanning.


  1. 1.
    Aharon M, Elad M, Bruckstein A (2006) Rm k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefGoogle Scholar
  2. 2.
    Biljana L, Stojkoska R, Kire V (2017) Trivodaliev. a review of internet of things for smart home: challenges and solutions. J Clean Prod 140:1454–1464CrossRefGoogle Scholar
  3. 3.
    Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542CrossRefGoogle Scholar
  4. 4.
    Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chu S, Narayanan S, Jay Kuo C-C (2009) Environmental sound recognition with time-frequency audio features. IEEE Trans Audio Speech Lang Process 17(6):1142–1158CrossRefGoogle Scholar
  6. 6.
    Duan S, Zhang J, Roe P, Towsey M (2014) A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42(4):637–661CrossRefGoogle Scholar
  7. 7.
    Engan K, Aase SO, Hakon Husoy J (1999) Method of optimal directions for frame design. Acoust Speech Sign Process 1999 Proc 1999 IEEE Int Conf 5:2443–2446CrossRefGoogle Scholar
  8. 8.
    Fleury A, Noury N, Vacher M, Glasson H, Seri JF (2008) Sound and speech detection and classification in a health smart home. Proc IEEE Int Conf Eng Med Biol Soc: 4644–4647Google Scholar
  9. 9.
    Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288CrossRefGoogle Scholar
  10. 10.
    Gadde A, Anis A, Ortega A (2014) Active semi-supervised learning using sampling theory for graph signals. Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining: 492–501Google Scholar
  11. 11.
    Han W, Coutinho E, Ruan H, Li H, Schuller B, Yu X, Zhu X (2016) Semi-supervised active learning for sound classification in hybrid learning environments. PLoS One 11(9):e0162075CrossRefGoogle Scholar
  12. 12.
    Jayalakshmi SL, Chandrakala S, Nedunchelian R (2018) Global statistical features-based approach for acoustic event detection. Appl Acoust 139:113–118CrossRefGoogle Scholar
  13. 13.
    Ji W, Wang R, Ma J (2018) Dictionary-based active learning method for sound event classification. Multimed Tools ApplGoogle Scholar
  14. 14.
    Jin X, Han J (2011) K-medoids clustering. Encyclopedia of machine learning: 564–565Google Scholar
  15. 15.
    Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365CrossRefGoogle Scholar
  16. 16.
    Maijala P, Zhao S, Heittola T, Virtanen T (2018) Environmental noise monitoring using source classification in sensors. Appl Acoust 129:258–267CrossRefGoogle Scholar
  17. 17.
    Mallat SG, Zhang Z (1993) Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415CrossRefGoogle Scholar
  18. 18.
    Mesaros A, Heittola T, Eronen A, Virtanen T (2010) Acoustic event detection in real life recordings. Signal Process Conf 2010 18th European: 1267–1271Google Scholar
  19. 19.
    Morrison D, Wang R, De Silva LC (2005) Spoken affect classification using neural networks. Granular Comput, 2005 IEEE Int Conf: 583–586Google Scholar
  20. 20.
    Morrison D, Wang R, De Silva LC, Xu WL (2005) Real-time spoken affect classification and its application in call-centres. Information technology and applications, 2005. ICITA 2005. Third international conference on 1:483–487Google Scholar
  21. 21.
    Ophir B, Lustig M, Elad M (2011) Multi-scale dictionary learning using wavelets. IEEE J Select Topics Signal Process 5(5):1014–1024CrossRefGoogle Scholar
  22. 22.
    Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341CrossRefGoogle Scholar
  23. 23.
    Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Signals, systems and computers, 1993. 1993 conference record of the twenty-seventh Asilomar conference on: 40–44Google Scholar
  24. 24.
    Piczak KJ (2015) ESC: dataset for environmental sound classification. Proceedings of the 23rd ACM international conference on Multimedia: 1015–1018Google Scholar
  25. 25.
    Ren J, Jiang X, Yuan J, Magnenat-Thalmann N (2017) Sound-event classification using robust texture features for robot hearing. IEEE Trans Multimed 19(3):447–458CrossRefGoogle Scholar
  26. 26.
    Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. Proceedings of the 22nd ACM international conference on Multimedia: 1041–1044Google Scholar
  27. 27.
    Schröder J, Anemiiller J, Goetze S (2016) Classification of human cough signals using spectro-temporal Gabor filterbank features. Acoustics, speech and signal processing (ICASSP), 2016 IEEE international conference on: 6455–6459Google Scholar
  28. 28.
    Sharan RV, Moir TJ (2017) Robust acoustic event classification using deep neural networks. Inf Sci 396:24–32CrossRefGoogle Scholar
  29. 29.
    Shen J, Chen Z, Xu C, Wang H (2017) Polarization and solar altitude correlation analysis and application in object detection. Progress Inform Comput (PIC), 2017 Int Conf: 179–183Google Scholar
  30. 30.
    Shi Y, Gao Y, Wang R, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28CrossRefGoogle Scholar
  31. 31.
    Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(11):45–66zbMATHGoogle Scholar
  32. 32.
    Tüysüzoğlu G, Yaslan Y (2018) Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst Appl 91:364–373CrossRefGoogle Scholar
  33. 33.
    Vera-Candeas P, Ruiz-Reyes N, Rosa-Zurera M, Martinez-Munoz D, López-Ferreras F (2004) Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding. IEEE Signal Process Lett 11(3):349–352CrossRefGoogle Scholar
  34. 34.
    Wang Y (2008) A tree-based multi-class SVM classifier for digital library document. MultiMedia and information technology, 2008. MMIT'08. International conference on: 15–18Google Scholar
  35. 35.
    Wang C-Y, Wang J-C, Santoso A, Chiang C-C, Wu C-H (2017) Sound event recognition using auditory-receptive-field binary pattern and hierarchical-diving deep belief network. IEEE/ACM Trans Audio, Speech, Language Process: 1–16Google Scholar
  36. 36.
    Wang R, Ji W, Liu M, Wang X, Weng J, Deng S, Gao S, Yuan C-A (2018) Review on mining data from multiple data sources. Pattern Recogn LettGoogle Scholar
  37. 37.
    Ye J, Kobayashi T, Murakawa M (2017) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256CrossRefGoogle Scholar
  38. 38.
    Zhang J, Yuan H (2014) A Certainty-based active learning framework of meeting speech summarization. Computer Engineering and Networking: 235–242Google Scholar
  39. 39.
    Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn SystGoogle Scholar
  40. 40.
    Zhao S, Heittola T, Virtanen T (2017) Active learning for sound event classification by clustering unlabeled data. Acoust Speech Signal Process (ICASSP), 2017 IEEE Int Conf : 751–755Google Scholar
  41. 41.
    Zhao S, Heittola T, Virtanen T (2017) Learning vocal mode classifiers from heterogeneous data sources. Applications of signal processing to audio and acoustics (WASPAA), 2017 IEEE workshop: 16–20Google Scholar
  42. 42.
    Zhu X, Zhang S, Hu R, Zhu Y (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Xiao Qin
    • 1
  • Wanting Ji
    • 2
    • 3
    Email author
  • Ruili Wang
    • 2
    • 3
  • ChangAn Yuan
    • 1
  1. 1.Nanning Normal UniversityNanningChina
  2. 2.Zhejiang Gongshang UniversityHangzhouChina
  3. 3.Massey UniversityAucklandNew Zealand

Personalised recommendations