Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models

  • Clara Suied
  • Angélique Drémeau
  • Daniel Pressnitzer
  • Laurent Daudet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7900)


An important question for both signal processing and auditory science is to understand which features of a sound carry the most important information for the listener. Here we approach the issue by introducing the idea of “auditory sketches”: sparse representations of sounds, severely impoverished compared to the original, which nevertheless afford good performance on a given perceptual task. Starting from biologically-grounded representations (auditory models), a sketch is obtained by reconstructing a highly under-sampled selection of elementary atoms. Then, the sketch is evaluated with a psychophysical experiment involving human listeners. The process can be repeated iteratively. As a proof of concept, we present data for an emotion recognition task with short non-verbal sounds. We investigate 1/ the type of auditory representation that can be used for sketches 2/ the selection procedure to sparsify such representations 3/ the smallest number of atoms that can be kept 4/ the robustness to noise. Results indicate that it is possible to produce recognizable sketches with a very small number of atoms per second. Furthermore, at least in our experimental setup, a simple and fast under-sampling method based on selecting local maxima of the representation seems to perform as well or better than a more traditional algorithm aimed at minimizing the reconstruction error. Thus, auditory sketches may be a useful tool for choosing sparse dictionaries, and also for identifying the minimal set of features required in a specific perceptual task.


Sparse Representation Perceptual Task Speech Intelligibility Cortical Representation Human Listener 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mallat, S.: A Wavelet Tour of Signal Processing - The Sparse Way, 3rd edn. Academic Press (December 2008)Google Scholar
  2. 2.
    Gabor, D.: Acoustical quanta and the theory of hearing. Nature 159, 591–594 (1947)CrossRefGoogle Scholar
  3. 3.
    Plumbley, M., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.: Sparse representations in audio and music: From coding to source separation. Proceedings of IEEE 98(6), 995–1005 (2010)CrossRefGoogle Scholar
  4. 4.
    Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer (2010)Google Scholar
  5. 5.
    Aharon, M., Elad, M., Bruckstein, A.: K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  6. 6.
    Shannon, R., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)CrossRefGoogle Scholar
  7. 7.
    Patterson, R., Allerhand, M., Giguére, C.: Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America 98(4), 1890–1894 (1995)CrossRefGoogle Scholar
  8. 8.
    Chi, T., Ru, P., Shamma, S.A.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118(2), 887–906 (2005)CrossRefGoogle Scholar
  9. 9.
    Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M.: Music in our ears: the biological bases of musical timbre perception. PLoS Comp. Biol. 8(11), e1002759 (2012)Google Scholar
  10. 10.
    Portilla, J.: Image restoration through l0 analysis-based sparse optimization in tight frames. In: Proc. IEEE Int’l Conference on Image Processing (ICIP), pp. 3865–3868 (2009)Google Scholar
  11. 11.
    Belin, P., Fillion-Bilosdeau, S., Gosselin, F.: The montreal affective voices: A validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods 40(2), 531–539 (2008)CrossRefGoogle Scholar
  12. 12.
    Elhiliali, M., Chi, T., Shamma, S.A.: A spectro-temporal modulatio index (stmi) for assessment of speech intelligibility. Speech Communication 41(2-3), 331–348 (2003)CrossRefGoogle Scholar
  13. 13.
    Griffin, D., Lim, J.: Signal reconstruction from short-time fourier transform magnitude. IEEE Trans. Acoust., Speech, and Signal Proc. 32(2), 236–243 (1984)CrossRefGoogle Scholar
  14. 14.
    Sturmel, N., Daudet, L.: Signal reconstruction from its STFT magnitude: a state of the art. In: Proc. International Conference on Digital Audio Effects, DAFx 2011 (2011)Google Scholar
  15. 15.
    Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. on Information Theory 38(2), 824–839 (1992)CrossRefGoogle Scholar
  16. 16.
    Drémeau, A., Herzet, C., Daudet, L.: Boltzmann machine and mean-field approximation for structured sparse decompositions. IEEE Trans. on Signal Processing 60(7), 3425–3438 (2012)CrossRefGoogle Scholar
  17. 17.
    Elad, M., Milanfar, P., Rubinstein, R.: Analysis versus synthesis in signal priors. Inverse problems 23(3), 947–968 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5-6), 629–654 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Hoogenboom, R., Lew, M.: Face detection using local maxima. In: Proc. Int’l Conference on Automatic Face and Gesture Recognition, 334–339 (1996)Google Scholar
  20. 20.
    Schwartzman, A., Gavrilov, Y., Adler, R.J.: Multiple testing of local maxima for detection of peaks in 1d. Annals of Statistics 39(6), 3290–3319 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004)MathSciNetGoogle Scholar
  22. 22.
    Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Int’l Conference on Sampling Theory and Applications, SAMPTA (2011)Google Scholar
  23. 23.
    Nam, S., Davies, M., Elad, M., Gribonval, R.: Cosparse analysis modeling - uniqueness and algorithms. In: Proc. IEEE Int’l Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5804–5807 (2011)Google Scholar
  24. 24.
    Balazs, P., Laback, B., Eckel, G., Deutsch, W.: Time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking. IEEE Transactions on Audio, Speech and Language Processing 18(1), 34–39 (2010)CrossRefGoogle Scholar
  25. 25.
    Mesgarani, N., Shamma, S.A.: Speech enhancement using spectro-temporal modulations. EURASIP Journal on Audio, Speech, and Music Processing V, ID 42357 (2007)Google Scholar
  26. 26.
    Agus, T.A., Suied, C., Thorpe, S.J., Pressnitzer, D.: Fast recognition of musical sounds based on timbre. Journal of the Acoustical Society of America 131(5), 4124–4133 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Clara Suied
    • 1
    • 3
  • Angélique Drémeau
    • 2
    • 4
  • Daniel Pressnitzer
    • 1
  • Laurent Daudet
    • 2
  1. 1.Laboratoire Psychologie de la PerceptionCNRS - Université Paris Descartes & Ecole Normale SupérieureParis Cedex 5France
  2. 2.Institut Langevin, ESPCI ParisTech and CNRS UMR 7587Université Paris DiderotParisFrance
  3. 3.Département Action et Cognition en Situation OpérationnelleInstitut de Recherche Biomédicale des Armées (IRBA)Brétigny sur OrgeFrance
  4. 4.Institut Mines-Telecom - Telecom ParisTech - CNRS/LTCI UMR 5141ParisFrance

Personalised recommendations