Artificial Intelligence Review

, Volume 42, Issue 4, pp 637–661

A survey of tagging techniques for music, speech and environmental sound

  • Shufei Duan
  • Jinglan Zhang
  • Paul Roe
  • Michael Towsey
Article

DOI: 10.1007/s10462-012-9362-y

Cite this article as:
Duan, S., Zhang, J., Roe, P. et al. Artif Intell Rev (2014) 42: 637. doi:10.1007/s10462-012-9362-y

Abstract

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas. We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches. After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

Keywords

Sound tagging Music tagging Speech recognition Environmental sound tagging Manual tagging Automatic tagging Semi-automatic tagging 

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  • Shufei Duan
    • 1
  • Jinglan Zhang
    • 1
  • Paul Roe
    • 1
  • Michael Towsey
    • 1
  1. 1.Faculty of Science and EngineeringQueensland University of TechnologyBrisbaneAustralia