Skip to main content
Log in

A survey of tagging techniques for music, speech and environmental sound

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas. We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches. After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agranat I (2009) Automatically identifying animal species from their vocalizations. Paper presented at the fifth international conference on bio-acoustics

  • Allegro S, Büchler M, Launer, S (2001) Automatic sound classification inspired by auditory scene analysis. Consistent and reliable acoustic cues for sound analysis CRAC oneday workshop Aalborg Denmark sunday September 2nd 2001 directly before Eurospeech 2001, 2005, 1–4

  • Anusuya MA, Katti SK (2010) Speech recognition by machine. A Rev Int J Comput Sci Inf Secur IJCSIS 6(3): 181–205

    Google Scholar 

  • Arora R, Lutfi RA (2009) An efficient code for environmental sound classification. J Acoust Soc Am 126: 7

    Article  Google Scholar 

  • Bardeli R, Wolff D, Kurth F, Koch M, Tauchert KH, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31(12): 1524–1534

    Article  Google Scholar 

  • Barrington L, Turnbull D, Lanckriet G (2008) Auto-tagging music content with semantic multinomials. In: Proceedings of internal conference on music information retrieval

  • Bertin-Mahieux T, Eck D, Mandel M (2011) Automatic tagging of audio: the state-of-the-art. In: Machine audition: principles, algorithms and systems. IGI Global, pp 334–352

  • Bischoff K, Firan CS, Nejdl W, Paiu R (2010) Bridging the gap between tagging and querying vocabularies: analyses and applications for enhancing multimedia IR. Web semantics: science, services and agents on the world wide web

  • Brandes ST (2008) Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conserv Int (SupplementS1) 18: S163–S173

    Google Scholar 

  • Brandes T, Naskrecki P, Figueroa H (2006) Using image processing to detect and classify narrow-band cricket and frog calls. J Acoust Soc Am 120: 2950–2957

    Article  Google Scholar 

  • Briggs F, Raich R, Fern XZ (6–9 Dec 2009) Audio classification of bird species: a statistical manifold approach. Paper presented at the data mining, 2009. ICDM ’09. Ninth IEEE international conference on

  • Burred JJ, Cella C-E, Peeters G, Röbel A, Schwarz D (2008) Using the SDIF sound description interchange format for audio features. Paper presented at the ISMIR

  • Cambron ME, Bowker RG (2006) An automated digital sound recording system: the amphibulator. Paper presented at the proceedings of the eighth IEEE international symposium on multimedia

  • Cano P, Koppenberger M, Groux S, Ricard J, Wack N, Herrera P (2005) Nearest-neighbor automatic sound annotation with a wordnet taxonomy. J Intell Inf Syst 24(2–3): 99–111

    Article  Google Scholar 

  • Chen L, Wright P, Nejdl W (2009) Improving music genre classification using collaborative tagging data. Paper presented at the proceedings of the second ACM international conference on web search and data mining

  • Chen Z, Maher RC (2006) Semi-automatic classification of bird vocalizations using spectral peak tracks. J Acoust Soc Am 120(5): 2974–2984

    Article  Google Scholar 

  • Cheng J, Sun Y, Ji L (2010) A call-independent and automatic acoustic system for the individual recognition of animals: a novel model using four passerines. Pattern Recognit 43(11): 3846–3852

    Article  MATH  Google Scholar 

  • Clifton T (1983) Music as heard: a study in applied phenomenology. Yale University Press, New Haven and London

    Google Scholar 

  • Coviello E, Barrington L, Antoni C, Lanckriet GRG (9–13 Aug 2010) Automatic music tagging with time series models. Paper presented at the proceedings of the 11th international society for music information retrieval conference, Utrecht, Netherlands

  • Cowling M, Sitte R (2003) Comparison of techniques for environmental sound recognition. Pattern Recognit Lett 24(15): 2895–2907

    Article  Google Scholar 

  • Dhanalakshmi P, Palanivel S, Ramalingam V (2009) Classification of audio signals using SVM and RBFNN. Expert Syst Appl 36(3 Part 2): 6069–6075

    Article  Google Scholar 

  • Duan S, Towsey M, Zhang J, Truskinger A, Wimmer J, Roe P (6–9 Dec 2011) Acoustic component detection for automatic species recognition in environmental monitoring. Paper presented at the intelligent sensors, sensor networks and information processing (ISSNIP), 2011 seventh international conference on

  • Dupont S, Luettin J (2000) Audio-visual speech modeling for continuous speech recognition. IEEE Trans Multimed 2(3): 141–151

    Article  Google Scholar 

  • Eck D, Lamere P, Bertin-Mahieux T, Green S (2007) Automatic generation of social tags for music recommendation. Paper presented at the advances in neural information processing systems

  • Franzen A, Gu IYH (5–8 Oct 2003) Classification of bird species by using key song searching: a comparative study. Paper presented at the systems, man and cybernetics, 2003. IEEE international conference on

  • Furui S (2004) Fifty years of progress in speech and speaker recognition. Acoust Soc Am J 116(4): 2497–2498

    Article  MathSciNet  Google Scholar 

  • Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19(15): 1964–1971

    Article  Google Scholar 

  • Gunasekaran S, Revathy K (2010) Content-based classification and retrieval of wild animal sounds using feature selection algorithm. Paper presented at the machine learning and computing (ICMLC), 2010 second international conference on

  • Hoffman M, Blei D, Cook P (2009) Easy as CBA: a simple probabilistic model for tagging music. Paper presented at the proceedings international symposium on music information retrieval, Kobe, Japan

  • Hu W, Van Nghia T, Bulusu N, Chou CT, Jha S, Taylor A (15 Apr 2005) The design and evaluation of a hybrid sensor network for cane-toad monitoring. Paper presented at the information processing in sensor networks, 2005. IPSN 2005. Fourth international symposium on

  • Huang C-J, Yang Y-J, Yang D-X, Chen Y-J (2009) Frog classification using machine learning techniques. Expert Syst Appl 36(2 Part 2): 3737–3743

    Article  Google Scholar 

  • Kim YE, Schmidt E, Emelle L (2008) MoodSwings: a collaborative game for music mood label. ISMIR’ 08: 231–236

    Google Scholar 

  • Kostek B, Szczuko P, Zwan P Processing of Musical Data Employing Rough Sets and Artificial Neural Networks.(2004) In: Tsumoto S, Slowinski R (eds) Rough sets and current trends in computing. Springer, Berlin 3066: pp 539–548

  • Kuznetsov A, Pyshkin E (2010) Searching for music: from melodies in mind to the resources on the web. Paper presented at the proceedings of the 13th international conference on humans and computers

  • Kwan C, Mei G, Zhao X, Ren Z, Xu R, Stanford V et al (17–21 May 2004) Bird classification algorithms: theory and experimental results. Paper presented at the IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings (ICASSP ’04)

  • Lakshminarayanan B, Raich R, Fern X (13–15 Dec 2009) A syllable-level probabilistic framework for bird species identification. Paper presented at the machine learning and applications, 2009. ICMLA ’09. International conference on

  • Lau A, Mason R, Pham B, Richards M, Roe P, Zhang J (11–14 June 2008) Monitoring the environment through acoustics using smartphone-based sensors and 3G networking. Paper presented at the proceedings of the second international workshop on wireless sensor network deployments (WiDeploy08); 4th IEEE international conference on distributed computing in sensor systems, DCOSS 2008, Greece

  • Law E, West K, Mandel M, Bay M, Downie JS (2009) Evaluation of algorithms using games: the case of music tagging. Evaluation, pp, 387–392

  • Levy M, Sandler M (2009) Music information retrieval using social tags and audio. Multimed IEEE Trans 11(3): 383–395

    Article  Google Scholar 

  • Lidy T, Silla CN Jr, Cornelis O, Gouyon F, Rauber A, Kaestner CAA et al (2010) On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections. Signal Process 90(4): 1032–1048

    Article  MATH  Google Scholar 

  • Liu D (2003) Automatic mood detection from acoustic music data. Paper presented at the proceedings of the international conference on music information retrieval

  • Lo H-Y, Lin S-D, Wang H-M (2011) Audio tag annotation and retrieval using tag count information. Paper presented at the proceedings of the 17th international conference on advances in multimedia modeling, volume part I

  • Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. Paper presented at the the preceedings of the 9th international conference on music information retrieval (ISMIR)

  • Martin K (1998) Toward automatic sound source recognition: identifying musical instruments. Paper presented at the NATO computational hearing advanced study institute

  • McKinney MF, Breebaart J (2003) Features for audio and music classification. Paper presented at the proceedings of the 4th ISMIR

  • Milicevic A, Nanopoulos A, Ivanovic M (2010) Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions. Artif Intell Rev 33(3): 187–209

    Article  Google Scholar 

  • Miotto R, Barrington L, Lanckriet G (2010) Improving auto-tagging by modeling semantic co-occurrences. Paper presented at the international society of music information retrieval conference, Utrecht

  • Mitrovic D, Zeppelzauer M, Breiteneder C (2006) Discrimination and retrieval of animal sounds

  • Mitrovic D, Zeppelzauer M, Eidenberger H (2009) On feature selection in environmental sound recognition. Paper presented at the ELMAR, 2009. ELMAR ’09. International symposium

  • Moore R (1994) Twenty things we still don’t know about speech. Paper presented at the progress and prospects of speech research and technology: proceedings of the CRIM/FORWISS workshop

  • Nanopoulos A, Karydis I (22–27 May 2011) Know thy neighbor: combining audio features and social tags for effective music similarity. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on

  • Negishi Y, Kawaguchi N (2007) Instant learning sound sensor: flexible environmental sound recognition system. Paper presented at the fourth international conference on networked sensing systems

  • Ness SR, Theocharis A, Tzanetakis G, Martins LG (2009) Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. Paper presented at the proceedings of the 17th ACM international conference on Multimedia

  • Ogihara FWXWBSTLaM (2009) Tag integrated multi-label music style classification with hypergraph. In: Proceedings of the 10th international society for music information retrieval conference, pp 363–368

  • Olson DL, Delen D (2008) Advanced data mining techniques, 1st edn. Springer, p 138, ISBN 3540769161

  • Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1((1): 1–96

    Article  MATH  Google Scholar 

  • Panagakis Y, Kotropoulos C (22–27 May 2011) Automatic music tagging via PARAFAC2. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on

  • Planitz B, Roe P, Sumitomo J, Towsey M, Williamson I, Wimmer J, et al (2009) Listening to nature: acoustic monitoring of the environment. Paper presented at the microsoftescience workshop

  • Reed J, Lee C (2009) On the importance of modeling temporal information in music tag annotation. Paper presented at the acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE international conference on

  • Selin A, Turunen J, Tanttu JT (2007) Wavelets in recognition of bird sounds. EURASIP J Appl Signal Process 1: 141–141

    Google Scholar 

  • Selina C, Narayanan S, Jay Kuo CC (2008) Environmental sound recognition using MP-based features. Paper presented at the acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on

  • Selouani SA, Kardouchi M, Hervet E, Roy D (2005) Automatic birdsong recognition based on autoregressive time-delay neural networks. Paper presented at the computational intelligence methods and applications, 2005 ICSC congress on

  • Stowell D, Plumbley M (2011) Birdsong and C4DM: a survey of UK birdsong and machine recognition for music researchers: centre for digital music. University of London, Queen Mary

    Google Scholar 

  • Sundaram S, Narayanan S (2007) Analysis of audio clustering using word descriptions. Paper presented at the acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE international conference on

  • Takagi J, Ohishi Y, Kimura A, Sugiyama M, Yamada M, Kameoka H (22–27 May 2011) Automatic audio tag classification via semi-supervised canonical density estimation. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on

  • Temko A, Nadeu C (2006) Classification of acoustic events using SVM-based clustering schemes. Pattern Recognit 39(4): 682–694

    Article  MATH  Google Scholar 

  • Temko A, Nadeu C (2009) Acoustic event detection in meeting-room environments. Pattern Recognit Lett 30(14): 1281–1288

    Article  Google Scholar 

  • Thanh D, Bulusu N, Wen H (2008) Lightweight acoustic classification for cane-toad monitoring. Paper presented at the signals, systems and computers, 42nd asilomar conference on signal processing

  • Tingle D, Kim YE, Turnbull D (2010) Exploring automatic music annotation with “acoustically-objective” tags. Paper presented at the proceedings of the international conference on multimedia information retrieval

  • Towsey M, Planitz B, Nantes A, Wimmer J, Roe P (2012) A toolbox for animal call recognition. Bioacoustics, 1–19

  • Truskinger AM, Yang H, Wimmer J, Zhang J, Williamson I, Roe P (2011) Large scale participatory acoustic sensor data analysis: tools and reputation models to enhance effectiveness

  • Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. Audio Speech Lang Process IEEE Trans 16(2): 467–476

    Article  Google Scholar 

  • Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. Speech Audio Process IEEE Trans 10(5): 293–302

    Article  Google Scholar 

  • Uribe OA, Meana HMP, Miyatake MN (7–9 Sep 2005) Environmental sounds recognition system using the speech recognition system techniques. Paper presented at the electrical and electronics engineering, 2005 2nd international conference on

  • Vilches E, Escobar IA, Vallejo EE, Taylor CE (2006) Data mining applied to acoustic bird species recognition. Paper presented at the pattern recognition, 2006. ICPR 2006. 18th international conference on

  • Weninger F, Schuller B (2011) Audio recognition in the wild: static and dynamic classification on a real-world database of animal vocalizations. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on

  • Wichern G, Yamada M, Thornburg H, Sugiyama M, Spanias A (14–19 Mar 2010) Automatic audio tagging using covariate shift adaptation. Paper presented at the acoustics speech and signal processing (ICASSP), 2010 IEEE international conference on

  • Wold E, Blum T, Keislar D, Wheaten J (1996) Content-based classification, search, and retrieval of audio. Multimed IEEE 3(3): 27–36

    Article  Google Scholar 

  • Yang H, Zhang J, Roe P (2011) Using reputation management in participatory sensing for data classification. Paper presented at the proeccedings of 2nd international conference on ambient systems, networks and technologies

  • Yella S, Gupta NK, Dougherty MS (2007) Comparison of pattern recognition techniques for the classification of impact acoustic emissions. Transp Res Part C Emerg Technol 15(6): 345–360

    Article  Google Scholar 

  • Yong L, Ying L (25–26 Dec 2010) Eco-environmental sound classification based on matching pursuit and support vector Machine. Paper presented at the information engineering and computer science (ICIECS), 2010 2nd international conference on

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shufei Duan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, S., Zhang, J., Roe, P. et al. A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42, 637–661 (2014). https://doi.org/10.1007/s10462-012-9362-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9362-y

Keywords

Navigation