A survey of tagging techniques for music, speech and environmental sound

Duan, Shufei; Zhang, Jinglan; Roe, Paul; Towsey, Michael

doi:10.1007/s10462-012-9362-y

A survey of tagging techniques for music, speech and environmental sound

Published: 25 October 2012

Volume 42, pages 637–661, (2014)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Shufei Duan¹,
Jinglan Zhang¹,
Paul Roe¹ &
…
Michael Towsey¹

1210 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas. We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches. After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agranat I (2009) Automatically identifying animal species from their vocalizations. Paper presented at the fifth international conference on bio-acoustics
Allegro S, Büchler M, Launer, S (2001) Automatic sound classification inspired by auditory scene analysis. Consistent and reliable acoustic cues for sound analysis CRAC oneday workshop Aalborg Denmark sunday September 2nd 2001 directly before Eurospeech 2001, 2005, 1–4
Anusuya MA, Katti SK (2010) Speech recognition by machine. A Rev Int J Comput Sci Inf Secur IJCSIS 6(3): 181–205
Google Scholar
Arora R, Lutfi RA (2009) An efficient code for environmental sound classification. J Acoust Soc Am 126: 7
Article Google Scholar
Bardeli R, Wolff D, Kurth F, Koch M, Tauchert KH, Frommolt KH (2010) Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit Lett 31(12): 1524–1534
Article Google Scholar
Barrington L, Turnbull D, Lanckriet G (2008) Auto-tagging music content with semantic multinomials. In: Proceedings of internal conference on music information retrieval
Bertin-Mahieux T, Eck D, Mandel M (2011) Automatic tagging of audio: the state-of-the-art. In: Machine audition: principles, algorithms and systems. IGI Global, pp 334–352
Bischoff K, Firan CS, Nejdl W, Paiu R (2010) Bridging the gap between tagging and querying vocabularies: analyses and applications for enhancing multimedia IR. Web semantics: science, services and agents on the world wide web
Brandes ST (2008) Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conserv Int (SupplementS1) 18: S163–S173
Google Scholar
Brandes T, Naskrecki P, Figueroa H (2006) Using image processing to detect and classify narrow-band cricket and frog calls. J Acoust Soc Am 120: 2950–2957
Article Google Scholar
Briggs F, Raich R, Fern XZ (6–9 Dec 2009) Audio classification of bird species: a statistical manifold approach. Paper presented at the data mining, 2009. ICDM ’09. Ninth IEEE international conference on
Burred JJ, Cella C-E, Peeters G, Röbel A, Schwarz D (2008) Using the SDIF sound description interchange format for audio features. Paper presented at the ISMIR
Cambron ME, Bowker RG (2006) An automated digital sound recording system: the amphibulator. Paper presented at the proceedings of the eighth IEEE international symposium on multimedia
Cano P, Koppenberger M, Groux S, Ricard J, Wack N, Herrera P (2005) Nearest-neighbor automatic sound annotation with a wordnet taxonomy. J Intell Inf Syst 24(2–3): 99–111
Article Google Scholar
Chen L, Wright P, Nejdl W (2009) Improving music genre classification using collaborative tagging data. Paper presented at the proceedings of the second ACM international conference on web search and data mining
Chen Z, Maher RC (2006) Semi-automatic classification of bird vocalizations using spectral peak tracks. J Acoust Soc Am 120(5): 2974–2984
Article Google Scholar
Cheng J, Sun Y, Ji L (2010) A call-independent and automatic acoustic system for the individual recognition of animals: a novel model using four passerines. Pattern Recognit 43(11): 3846–3852
Article MATH Google Scholar
Clifton T (1983) Music as heard: a study in applied phenomenology. Yale University Press, New Haven and London
Google Scholar
Coviello E, Barrington L, Antoni C, Lanckriet GRG (9–13 Aug 2010) Automatic music tagging with time series models. Paper presented at the proceedings of the 11th international society for music information retrieval conference, Utrecht, Netherlands
Cowling M, Sitte R (2003) Comparison of techniques for environmental sound recognition. Pattern Recognit Lett 24(15): 2895–2907
Article Google Scholar
Dhanalakshmi P, Palanivel S, Ramalingam V (2009) Classification of audio signals using SVM and RBFNN. Expert Syst Appl 36(3 Part 2): 6069–6075
Article Google Scholar
Duan S, Towsey M, Zhang J, Truskinger A, Wimmer J, Roe P (6–9 Dec 2011) Acoustic component detection for automatic species recognition in environmental monitoring. Paper presented at the intelligent sensors, sensor networks and information processing (ISSNIP), 2011 seventh international conference on
Dupont S, Luettin J (2000) Audio-visual speech modeling for continuous speech recognition. IEEE Trans Multimed 2(3): 141–151
Article Google Scholar
Eck D, Lamere P, Bertin-Mahieux T, Green S (2007) Automatic generation of social tags for music recommendation. Paper presented at the advances in neural information processing systems
Franzen A, Gu IYH (5–8 Oct 2003) Classification of bird species by using key song searching: a comparative study. Paper presented at the systems, man and cybernetics, 2003. IEEE international conference on
Furui S (2004) Fifty years of progress in speech and speaker recognition. Acoust Soc Am J 116(4): 2497–2498
Article MathSciNet Google Scholar
Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19(15): 1964–1971
Article Google Scholar
Gunasekaran S, Revathy K (2010) Content-based classification and retrieval of wild animal sounds using feature selection algorithm. Paper presented at the machine learning and computing (ICMLC), 2010 second international conference on
Hoffman M, Blei D, Cook P (2009) Easy as CBA: a simple probabilistic model for tagging music. Paper presented at the proceedings international symposium on music information retrieval, Kobe, Japan
Hu W, Van Nghia T, Bulusu N, Chou CT, Jha S, Taylor A (15 Apr 2005) The design and evaluation of a hybrid sensor network for cane-toad monitoring. Paper presented at the information processing in sensor networks, 2005. IPSN 2005. Fourth international symposium on
Huang C-J, Yang Y-J, Yang D-X, Chen Y-J (2009) Frog classification using machine learning techniques. Expert Syst Appl 36(2 Part 2): 3737–3743
Article Google Scholar
Kim YE, Schmidt E, Emelle L (2008) MoodSwings: a collaborative game for music mood label. ISMIR’ 08: 231–236
Google Scholar
Kostek B, Szczuko P, Zwan P Processing of Musical Data Employing Rough Sets and Artificial Neural Networks.(2004) In: Tsumoto S, Slowinski R (eds) Rough sets and current trends in computing. Springer, Berlin 3066: pp 539–548
Kuznetsov A, Pyshkin E (2010) Searching for music: from melodies in mind to the resources on the web. Paper presented at the proceedings of the 13th international conference on humans and computers
Kwan C, Mei G, Zhao X, Ren Z, Xu R, Stanford V et al (17–21 May 2004) Bird classification algorithms: theory and experimental results. Paper presented at the IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings (ICASSP ’04)
Lakshminarayanan B, Raich R, Fern X (13–15 Dec 2009) A syllable-level probabilistic framework for bird species identification. Paper presented at the machine learning and applications, 2009. ICMLA ’09. International conference on
Lau A, Mason R, Pham B, Richards M, Roe P, Zhang J (11–14 June 2008) Monitoring the environment through acoustics using smartphone-based sensors and 3G networking. Paper presented at the proceedings of the second international workshop on wireless sensor network deployments (WiDeploy08); 4th IEEE international conference on distributed computing in sensor systems, DCOSS 2008, Greece
Law E, West K, Mandel M, Bay M, Downie JS (2009) Evaluation of algorithms using games: the case of music tagging. Evaluation, pp, 387–392
Levy M, Sandler M (2009) Music information retrieval using social tags and audio. Multimed IEEE Trans 11(3): 383–395
Article Google Scholar
Lidy T, Silla CN Jr, Cornelis O, Gouyon F, Rauber A, Kaestner CAA et al (2010) On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-Western and ethnic music collections. Signal Process 90(4): 1032–1048
Article MATH Google Scholar
Liu D (2003) Automatic mood detection from acoustic music data. Paper presented at the proceedings of the international conference on music information retrieval
Lo H-Y, Lin S-D, Wang H-M (2011) Audio tag annotation and retrieval using tag count information. Paper presented at the proceedings of the 17th international conference on advances in multimedia modeling, volume part I
Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. Paper presented at the the preceedings of the 9th international conference on music information retrieval (ISMIR)
Martin K (1998) Toward automatic sound source recognition: identifying musical instruments. Paper presented at the NATO computational hearing advanced study institute
McKinney MF, Breebaart J (2003) Features for audio and music classification. Paper presented at the proceedings of the 4th ISMIR
Milicevic A, Nanopoulos A, Ivanovic M (2010) Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions. Artif Intell Rev 33(3): 187–209
Article Google Scholar
Miotto R, Barrington L, Lanckriet G (2010) Improving auto-tagging by modeling semantic co-occurrences. Paper presented at the international society of music information retrieval conference, Utrecht
Mitrovic D, Zeppelzauer M, Breiteneder C (2006) Discrimination and retrieval of animal sounds
Mitrovic D, Zeppelzauer M, Eidenberger H (2009) On feature selection in environmental sound recognition. Paper presented at the ELMAR, 2009. ELMAR ’09. International symposium
Moore R (1994) Twenty things we still don’t know about speech. Paper presented at the progress and prospects of speech research and technology: proceedings of the CRIM/FORWISS workshop
Nanopoulos A, Karydis I (22–27 May 2011) Know thy neighbor: combining audio features and social tags for effective music similarity. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Negishi Y, Kawaguchi N (2007) Instant learning sound sensor: flexible environmental sound recognition system. Paper presented at the fourth international conference on networked sensing systems
Ness SR, Theocharis A, Tzanetakis G, Martins LG (2009) Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs. Paper presented at the proceedings of the 17th ACM international conference on Multimedia
Ogihara FWXWBSTLaM (2009) Tag integrated multi-label music style classification with hypergraph. In: Proceedings of the 10th international society for music information retrieval conference, pp 363–368
Olson DL, Delen D (2008) Advanced data mining techniques, 1st edn. Springer, p 138, ISBN 3540769161
Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1((1): 1–96
Article MATH Google Scholar
Panagakis Y, Kotropoulos C (22–27 May 2011) Automatic music tagging via PARAFAC2. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Planitz B, Roe P, Sumitomo J, Towsey M, Williamson I, Wimmer J, et al (2009) Listening to nature: acoustic monitoring of the environment. Paper presented at the microsoftescience workshop
Reed J, Lee C (2009) On the importance of modeling temporal information in music tag annotation. Paper presented at the acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE international conference on
Selin A, Turunen J, Tanttu JT (2007) Wavelets in recognition of bird sounds. EURASIP J Appl Signal Process 1: 141–141
Google Scholar
Selina C, Narayanan S, Jay Kuo CC (2008) Environmental sound recognition using MP-based features. Paper presented at the acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on
Selouani SA, Kardouchi M, Hervet E, Roy D (2005) Automatic birdsong recognition based on autoregressive time-delay neural networks. Paper presented at the computational intelligence methods and applications, 2005 ICSC congress on
Stowell D, Plumbley M (2011) Birdsong and C4DM: a survey of UK birdsong and machine recognition for music researchers: centre for digital music. University of London, Queen Mary
Google Scholar
Sundaram S, Narayanan S (2007) Analysis of audio clustering using word descriptions. Paper presented at the acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE international conference on
Takagi J, Ohishi Y, Kimura A, Sugiyama M, Yamada M, Kameoka H (22–27 May 2011) Automatic audio tag classification via semi-supervised canonical density estimation. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Temko A, Nadeu C (2006) Classification of acoustic events using SVM-based clustering schemes. Pattern Recognit 39(4): 682–694
Article MATH Google Scholar
Temko A, Nadeu C (2009) Acoustic event detection in meeting-room environments. Pattern Recognit Lett 30(14): 1281–1288
Article Google Scholar
Thanh D, Bulusu N, Wen H (2008) Lightweight acoustic classification for cane-toad monitoring. Paper presented at the signals, systems and computers, 42nd asilomar conference on signal processing
Tingle D, Kim YE, Turnbull D (2010) Exploring automatic music annotation with “acoustically-objective” tags. Paper presented at the proceedings of the international conference on multimedia information retrieval
Towsey M, Planitz B, Nantes A, Wimmer J, Roe P (2012) A toolbox for animal call recognition. Bioacoustics, 1–19
Truskinger AM, Yang H, Wimmer J, Zhang J, Williamson I, Roe P (2011) Large scale participatory acoustic sensor data analysis: tools and reputation models to enhance effectiveness
Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. Audio Speech Lang Process IEEE Trans 16(2): 467–476
Article Google Scholar
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. Speech Audio Process IEEE Trans 10(5): 293–302
Article Google Scholar
Uribe OA, Meana HMP, Miyatake MN (7–9 Sep 2005) Environmental sounds recognition system using the speech recognition system techniques. Paper presented at the electrical and electronics engineering, 2005 2nd international conference on
Vilches E, Escobar IA, Vallejo EE, Taylor CE (2006) Data mining applied to acoustic bird species recognition. Paper presented at the pattern recognition, 2006. ICPR 2006. 18th international conference on
Weninger F, Schuller B (2011) Audio recognition in the wild: static and dynamic classification on a real-world database of animal vocalizations. Paper presented at the acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on
Wichern G, Yamada M, Thornburg H, Sugiyama M, Spanias A (14–19 Mar 2010) Automatic audio tagging using covariate shift adaptation. Paper presented at the acoustics speech and signal processing (ICASSP), 2010 IEEE international conference on
Wold E, Blum T, Keislar D, Wheaten J (1996) Content-based classification, search, and retrieval of audio. Multimed IEEE 3(3): 27–36
Article Google Scholar
Yang H, Zhang J, Roe P (2011) Using reputation management in participatory sensing for data classification. Paper presented at the proeccedings of 2nd international conference on ambient systems, networks and technologies
Yella S, Gupta NK, Dougherty MS (2007) Comparison of pattern recognition techniques for the classification of impact acoustic emissions. Transp Res Part C Emerg Technol 15(6): 345–360
Article Google Scholar
Yong L, Ying L (25–26 Dec 2010) Eco-environmental sound classification based on matching pursuit and support vector Machine. Paper presented at the information engineering and computer science (ICIECS), 2010 2nd international conference on

Download references

Author information

Authors and Affiliations

Faculty of Science and Engineering, Queensland University of Technology, Brisbane, QLD, Australia
Shufei Duan, Jinglan Zhang, Paul Roe & Michael Towsey

Authors

Shufei Duan
View author publications
You can also search for this author in PubMed Google Scholar
Jinglan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Paul Roe
View author publications
You can also search for this author in PubMed Google Scholar
Michael Towsey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shufei Duan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, S., Zhang, J., Roe, P. et al. A survey of tagging techniques for music, speech and environmental sound. Artif Intell Rev 42, 637–661 (2014). https://doi.org/10.1007/s10462-012-9362-y

Download citation

Published: 25 October 2012
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10462-012-9362-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of tagging techniques for music, speech and environmental sound

Abstract

Access this article

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Vision

From Audio to Music Notation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of tagging techniques for music, speech and environmental sound

Abstract

Access this article

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Vision

From Audio to Music Notation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation