Abstract
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas. We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches. After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Similar content being viewed by others
References
Agranat I (2009) Automatically identifying animal species from their vocalizations. Paper presented at the fifth international conference on bio-acoustics
Allegro S, Büchler M, Launer, S (2001) Automatic sound classification inspired by auditory scene analysis. Consistent and reliable acoustic cues for sound analysis CRAC oneday workshop Aalborg Denmark sunday September 2nd 2001 directly before Eurospeech 2001, 2005, 1–4
Anusuya MA, Katti SK (2010) Speech recognition by machine. A Rev Int J Comput Sci Inf Secur IJCSIS 6(3): 181–205
Arora R, Lutfi RA (2009) An efficient code for environmental sound classification. J Acoust Soc Am 126: 7
Bardeli R, Wolff D, Kurth F, Koch M, Tauchert KH, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31(12): 1524–1534
Barrington L, Turnbull D, Lanckriet G (2008) Auto-tagging music content with semantic multinomials. In: Proceedings of internal conference on music information retrieval
Bertin-Mahieux T, Eck D, Mandel M (2011) Automatic tagging of audio: the state-of-the-art. In: Machine audition: principles, algorithms and systems. IGI Global, pp 334–352
Bischoff K, Firan CS, Nejdl W, Paiu R (2010) Bridging the gap between tagging and querying vocabularies: analyses and applications for enhancing multimedia IR. Web semantics: science, services and agents on the world wide web
Brandes ST (2008) Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conserv Int (SupplementS1) 18: S163–S173
Brandes T, Naskrecki P, Figueroa H (2006) Using image processing to detect and classify narrow-band cricket and frog calls. J Acoust Soc Am 120: 2950–2957
Briggs F, Raich R, Fern XZ (6–9 Dec 2009) Audio classification of bird species: a statistical manifold approach. Paper presented at the data mining, 2009. ICDM ’09. Ninth IEEE international conference on
Burred JJ, Cella C-E, Peeters G, Röbel A, Schwarz D (2008) Using the SDIF sound description interchange format for audio features. Paper presented at the ISMIR
Cambron ME, Bowker RG (2006) An automated digital sound recording system: the amphibulator. Paper presented at the proceedings of the eighth IEEE international symposium on multimedia
Cano P, Koppenberger M, Groux S, Ricard J, Wack N, Herrera P (2005) Nearest-neighbor automatic sound annotation with a wordnet taxonomy. J Intell Inf Syst 24(2–3): 99–111
Chen L, Wright P, Nejdl W (2009) Improving music genre classification using collaborative tagging data. Paper presented at the proceedings of the second ACM international conference on web search and data mining
Chen Z, Maher RC (2006) Semi-automatic classification of bird vocalizations using spectral peak tracks. J Acoust Soc Am 120(5): 2974–2984
Cheng J, Sun Y, Ji L (2010) A call-independent and automatic acoustic system for the individual recognition of animals: a novel model using four passerines. Pattern Recognit 43(11): 3846–3852
Clifton T (1983) Music as heard: a study in applied phenomenology. Yale University Press, New Haven and London
Coviello E, Barrington L, Antoni C, Lanckriet GRG (9–13 Aug 2010) Automatic music tagging with time series models. Paper presented at the proceedings of the 11th international society for music information retrieval conference, Utrecht, Netherlands
Cowling M, Sitte R (2003) Comparison of techniques for environmental sound recognition. Pattern Recognit Lett 24(15): 2895–2907
Dhanalakshmi P, Palanivel S, Ramalingam V (2009) Classification of audio signals using SVM and RBFNN. Expert Syst Appl 36(3 Part 2): 6069–6075
Duan S, Towsey M, Zhang J, Truskinger A, Wimmer J, Roe P (6–9 Dec 2011) Acoustic component detection for automatic species recognition in environmental monitoring. Paper presented at the intelligent sensors, sensor networks and information processing (ISSNIP), 2011 seventh international conference on
Dupont S, Luettin J (2000) Audio-visual speech modeling for continuous speech recognition. IEEE Trans Multimed 2(3): 141–151
Eck D, Lamere P, Bertin-Mahieux T, Green S (2007) Automatic generation of social tags for music recommendation. Paper presented at the advances in neural information processing systems
Franzen A, Gu IYH (5–8 Oct 2003) Classification of bird species by using key song searching: a comparative study. Paper presented at the systems, man and cybernetics, 2003. IEEE international conference on
Furui S (2004) Fifty years of progress in speech and speaker recognition. Acoust Soc Am J 116(4): 2497–2498
Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19(15): 1964–1971
Gunasekaran S, Revathy K (2010) Content-based classification and retrieval of wild animal sounds using feature selection algorithm. Paper presented at the machine learning and computing (ICMLC), 2010 second international conference on
Hoffman M, Blei D, Cook P (2009) Easy as CBA: a simple probabilistic model for tagging music. Paper presented at the proceedings international symposium on music information retrieval, Kobe, Japan
Hu W, Van Nghia T, Bulusu N, Chou CT, Jha S, Taylor A (15 Apr 2005) The design and evaluation of a hybrid sensor network for cane-toad monitoring. Paper presented at the information processing in sensor networks, 2005. IPSN 2005. Fourth international symposium on
Huang C-J, Yang Y-J, Yang D-X, Chen Y-J (2009) Frog classification using machine learning techniques. Expert Syst Appl 36(2 Part 2): 3737–3743
Kim YE, Schmidt E, Emelle L (2008) MoodSwings: a collaborative game for music mood label. ISMIR’ 08: 231–236
Kostek B, Szczuko P, Zwan P Processing of Musical Data Employing Rough Sets and Artificial Neural Networks.(2004) In: Tsumoto S, Slowinski R (eds) Rough sets and current trends in computing. Springer, Berlin 3066: pp 539–548
Kuznetsov A, Pyshkin E (2010) Searching for music: from melodies in mind to the resources on the web. Paper presented at the proceedings of the 13th international conference on humans and computers
Kwan C, Mei G, Zhao X, Ren Z, Xu R, Stanford V et al (17–21 May 2004) Bird classification algorithms: theory and experimental results. Paper presented at the IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings (ICASSP ’04)
Lakshminarayanan B, Raich R, Fern X (13–15 Dec 2009) A syllable-level probabilistic framework for bird species identification. Paper presented at the machine learning and applications, 2009. ICMLA ’09. International conference on
Lau A, Mason R, Pham B, Richards M, Roe P, Zhang J (11–14 June 2008) Monitoring the environment through acoustics using smartphone-based sensors and 3G networking. Paper presented at the proceedings of the second international workshop on wireless sensor network deployments (WiDeploy08); 4th IEEE international conference on distributed computing in sensor systems, DCOSS 2008, Greece
Law E, West K, Mandel M, Bay M, Downie JS (2009) Evaluation of algorithms using games: the case of music tagging. Evaluation, pp, 387–392
Levy M, Sandler M (2009) Music information retrieval using social tags and audio. Multimed IEEE Trans 11(3): 383–395
Lidy T, Silla CN Jr, Cornelis O, Gouyon F, Rauber A, Kaestner CAA et al (2010) On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections. Signal Process 90(4): 1032–1048
Liu D (2003) Automatic mood detection from acoustic music data. Paper presented at the proceedings of the international conference on music information retrieval
Lo H-Y, Lin S-D, Wang H-M (2011) Audio tag annotation and retrieval using tag count information. Paper presented at the proceedings of the 17th international conference on advances in multimedia modeling, volume part I
Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. Paper presented at the the preceedings of the 9th international conference on music information retrieval (ISMIR)
Martin K (1998) Toward automatic sound source recognition: identifying musical instruments. Paper presented at the NATO computational hearing advanced study institute
McKinney MF, Breebaart J (2003) Features for audio and music classification. Paper presented at the proceedings of the 4th ISMIR
Milicevic A, Nanopoulos A, Ivanovic M (2010) Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions. Artif Intell Rev 33(3): 187–209
Miotto R, Barrington L, Lanckriet G (2010) Improving auto-tagging by modeling semantic co-occurrences. Paper presented at the international society of music information retrieval conference, Utrecht
Mitrovic D, Zeppelzauer M, Breiteneder C (2006) Discrimination and retrieval of animal sounds
Mitrovic D, Zeppelzauer M, Eidenberger H (2009) On feature selection in environmental sound recognition. Paper presented at the ELMAR, 2009. ELMAR ’09. International symposium
Moore R (1994) Twenty things we still don’t know about speech. Paper presented at the progress and prospects of speech research and technology: proceedings of the CRIM/FORWISS workshop
Nanopoulos A, Karydis I (22–27 May 2011) Know thy neighbor: combining audio features and social tags for effective music similarity. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Negishi Y, Kawaguchi N (2007) Instant learning sound sensor: flexible environmental sound recognition system. Paper presented at the fourth international conference on networked sensing systems
Ness SR, Theocharis A, Tzanetakis G, Martins LG (2009) Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. Paper presented at the proceedings of the 17th ACM international conference on Multimedia
Ogihara FWXWBSTLaM (2009) Tag integrated multi-label music style classification with hypergraph. In: Proceedings of the 10th international society for music information retrieval conference, pp 363–368
Olson DL, Delen D (2008) Advanced data mining techniques, 1st edn. Springer, p 138, ISBN 3540769161
Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1((1): 1–96
Panagakis Y, Kotropoulos C (22–27 May 2011) Automatic music tagging via PARAFAC2. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Planitz B, Roe P, Sumitomo J, Towsey M, Williamson I, Wimmer J, et al (2009) Listening to nature: acoustic monitoring of the environment. Paper presented at the microsoftescience workshop
Reed J, Lee C (2009) On the importance of modeling temporal information in music tag annotation. Paper presented at the acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE international conference on
Selin A, Turunen J, Tanttu JT (2007) Wavelets in recognition of bird sounds. EURASIP J Appl Signal Process 1: 141–141
Selina C, Narayanan S, Jay Kuo CC (2008) Environmental sound recognition using MP-based features. Paper presented at the acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on
Selouani SA, Kardouchi M, Hervet E, Roy D (2005) Automatic birdsong recognition based on autoregressive time-delay neural networks. Paper presented at the computational intelligence methods and applications, 2005 ICSC congress on
Stowell D, Plumbley M (2011) Birdsong and C4DM: a survey of UK birdsong and machine recognition for music researchers: centre for digital music. University of London, Queen Mary
Sundaram S, Narayanan S (2007) Analysis of audio clustering using word descriptions. Paper presented at the acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE international conference on
Takagi J, Ohishi Y, Kimura A, Sugiyama M, Yamada M, Kameoka H (22–27 May 2011) Automatic audio tag classification via semi-supervised canonical density estimation. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Temko A, Nadeu C (2006) Classification of acoustic events using SVM-based clustering schemes. Pattern Recognit 39(4): 682–694
Temko A, Nadeu C (2009) Acoustic event detection in meeting-room environments. Pattern Recognit Lett 30(14): 1281–1288
Thanh D, Bulusu N, Wen H (2008) Lightweight acoustic classification for cane-toad monitoring. Paper presented at the signals, systems and computers, 42nd asilomar conference on signal processing
Tingle D, Kim YE, Turnbull D (2010) Exploring automatic music annotation with “acoustically-objective” tags. Paper presented at the proceedings of the international conference on multimedia information retrieval
Towsey M, Planitz B, Nantes A, Wimmer J, Roe P (2012) A toolbox for animal call recognition. Bioacoustics, 1–19
Truskinger AM, Yang H, Wimmer J, Zhang J, Williamson I, Roe P (2011) Large scale participatory acoustic sensor data analysis: tools and reputation models to enhance effectiveness
Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. Audio Speech Lang Process IEEE Trans 16(2): 467–476
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. Speech Audio Process IEEE Trans 10(5): 293–302
Uribe OA, Meana HMP, Miyatake MN (7–9 Sep 2005) Environmental sounds recognition system using the speech recognition system techniques. Paper presented at the electrical and electronics engineering, 2005 2nd international conference on
Vilches E, Escobar IA, Vallejo EE, Taylor CE (2006) Data mining applied to acoustic bird species recognition. Paper presented at the pattern recognition, 2006. ICPR 2006. 18th international conference on
Weninger F, Schuller B (2011) Audio recognition in the wild: static and dynamic classification on a real-world database of animal vocalizations. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Wichern G, Yamada M, Thornburg H, Sugiyama M, Spanias A (14–19 Mar 2010) Automatic audio tagging using covariate shift adaptation. Paper presented at the acoustics speech and signal processing (ICASSP), 2010 IEEE international conference on
Wold E, Blum T, Keislar D, Wheaten J (1996) Content-based classification, search, and retrieval of audio. Multimed IEEE 3(3): 27–36
Yang H, Zhang J, Roe P (2011) Using reputation management in participatory sensing for data classification. Paper presented at the proeccedings of 2nd international conference on ambient systems, networks and technologies
Yella S, Gupta NK, Dougherty MS (2007) Comparison of pattern recognition techniques for the classification of impact acoustic emissions. Transp Res Part C Emerg Technol 15(6): 345–360
Yong L, Ying L (25–26 Dec 2010) Eco-environmental sound classification based on matching pursuit and support vector Machine. Paper presented at the information engineering and computer science (ICIECS), 2010 2nd international conference on
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Duan, S., Zhang, J., Roe, P. et al. A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42, 637–661 (2014). https://doi.org/10.1007/s10462-012-9362-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9362-y