Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models
An important question for both signal processing and auditory science is to understand which features of a sound carry the most important information for the listener. Here we approach the issue by introducing the idea of “auditory sketches”: sparse representations of sounds, severely impoverished compared to the original, which nevertheless afford good performance on a given perceptual task. Starting from biologically-grounded representations (auditory models), a sketch is obtained by reconstructing a highly under-sampled selection of elementary atoms. Then, the sketch is evaluated with a psychophysical experiment involving human listeners. The process can be repeated iteratively. As a proof of concept, we present data for an emotion recognition task with short non-verbal sounds. We investigate 1/ the type of auditory representation that can be used for sketches 2/ the selection procedure to sparsify such representations 3/ the smallest number of atoms that can be kept 4/ the robustness to noise. Results indicate that it is possible to produce recognizable sketches with a very small number of atoms per second. Furthermore, at least in our experimental setup, a simple and fast under-sampling method based on selecting local maxima of the representation seems to perform as well or better than a more traditional algorithm aimed at minimizing the reconstruction error. Thus, auditory sketches may be a useful tool for choosing sparse dictionaries, and also for identifying the minimal set of features required in a specific perceptual task.
Unable to display preview. Download preview PDF.
- 1.Mallat, S.: A Wavelet Tour of Signal Processing - The Sparse Way, 3rd edn. Academic Press (December 2008)Google Scholar
- 4.Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer (2010)Google Scholar
- 9.Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M.: Music in our ears: the biological bases of musical timbre perception. PLoS Comp. Biol. 8(11), e1002759 (2012)Google Scholar
- 10.Portilla, J.: Image restoration through l0 analysis-based sparse optimization in tight frames. In: Proc. IEEE Int’l Conference on Image Processing (ICIP), pp. 3865–3868 (2009)Google Scholar
- 14.Sturmel, N., Daudet, L.: Signal reconstruction from its STFT magnitude: a state of the art. In: Proc. International Conference on Digital Audio Effects, DAFx 2011 (2011)Google Scholar
- 19.Hoogenboom, R., Lew, M.: Face detection using local maxima. In: Proc. Int’l Conference on Automatic Face and Gesture Recognition, 334–339 (1996)Google Scholar
- 22.Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Int’l Conference on Sampling Theory and Applications, SAMPTA (2011)Google Scholar
- 23.Nam, S., Davies, M., Elad, M., Gribonval, R.: Cosparse analysis modeling - uniqueness and algorithms. In: Proc. IEEE Int’l Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5804–5807 (2011)Google Scholar
- 25.Mesgarani, N., Shamma, S.A.: Speech enhancement using spectro-temporal modulations. EURASIP Journal on Audio, Speech, and Music Processing V, ID 42357 (2007)Google Scholar