An audio signal is a signal that contains information in the audible frequency range. Audio representation refers to the extraction of audio signal properties, or features, that are representative of the audio signal composition (both in temporal and spectral domain) and audio signal behavior over time. Feature extraction is typically combined with feature selection, through which the best set of features for the intended operation on the audio signal is defined.
Audio feature extraction typically leads to a strongly reduced audio signal representation. Obtaining such representation can improve the efficiency of audio processing and benefit many applications based on such processing. For example, a compact representation of an audio signal in the form of a fingerprintcan enable extremely fast search for a match between this signal and a large-scale audio database for the purpose of audio signal...
KeywordsDiscrete Fourier Transform Audio Signal Audio Feature Music Signal Audio Frame
- 3.Foote J. Content-based retrieval of music and audio. In: Proceedings of the SPIE Multimedia Storage and Archiving Systems II; 1997. p. 138–47.Google Scholar
- 8.Peltonen V, Tuomi J, Klapuri AP, Huopaniemi J, Sorsa T. Computational auditory scene recognition. Proc IEEE Int Conf Acoustics, Speech Signal Process. 2002;2:1941–4.Google Scholar
- 10.Saunders J. Real-time discrimination of broadcast speech/music. Proc IEEE Int Conf Acoustics, Speech Signal Process. 1996;2:993–6.Google Scholar
- 11.Scheirer E, Slaney M. Construction and evaluation of a robust multifeature music/speech discriminator. Proc IEEE Int Conf Acoustics, Speech Signal Process. 1997;2:1331–4.Google Scholar
- 12.Tzanetakis G, Cook P. Marsyas: a framework for audio analysis. Organized Sound. 2000;4(3).Google Scholar
- 13.Wall ME, Rechtsteiner A, Rocha LM. Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M, editors. A practical approach to microarray data analysis. Norwell: Kluwer; 2003. p. 91–109. LANL LA-UR-02-4001.Google Scholar
- 15.Zhang T, Kuo C-CJ. Video content parsing based on combined audio and visual information. In: Proceedings of the SPIE: Multimedia Storage and Archiving Systems, IV; 1999. p. 78–89.Google Scholar