Skip to main content
Log in

A proposal and evaluation of new timbre visualization methods for audio sample browsers

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Searching through vast libraries of sound samples can be a daunting and time-consuming task. Modern audio sample browsers use mappings between acoustic properties and visual attributes to visually differentiate displayed items. There are few studies focused on how well these mappings help users search for a specific sample. We propose new methods for generating textural labels and positioning samples based on perceptual representations of timbre. We perform a series of studies to evaluate the benefits of using shape, color, or texture as labels in a known-item search task. We describe the motivation and implementation of the study, and present an in-depth analysis of results. We find that shape significantly improves task performance, while color and texture have little effect. We also compare results between in-person and online participants and propose research directions for further studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Available for download from https://github.com/NECOTIS/ERBlet-Cochlear-Filterbank

  2. For a linear regression model to be considered appropriate, the distribution of prediction errors (residuals) should resemble a normal distribution [37].

  3. Available for download from https://github.com/NECOTIS/timbre-visualisation-study

References

  1. Abdelmounaime S, Dong-Chen H (2013) New Brodatz-based image databases for grayscale color and multiband texture analysis. Int Sch Res Not Machine Vision 2013:1–14. https://doi.org/10.1155/2013/876386

    Article  Google Scholar 

  2. Adeli M, Rouat J, Molotchnikoff S (2014) Audiovisual correspondence between musical timbre and visual shapes. Front Hum Neurosci, 8. https://doi.org/10.3389/fnhum.2014.00352

  3. Adeli M, Rouat J, Wood S, Molotchnikoff S, Plourde E (2016) A flexible bio-inspired hierarchical model for analyzing musical timbre. IEEE/ACM Trans Audio Speech Language Process 24(5):875–889. https://doi.org/10.1109/TASLP.2016.2530405

    Article  Google Scholar 

  4. Ahlberg C, Shneiderman B Visual information seeking: tight coupling of dynamic query filters with starfield displays. In: Readings in human-computer interaction, interactive technologies. Morgan Kaufmann, pp 450–456

  5. Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J of Stat Softw 67(1):1–48. https://doi.org/10.18637/jss.v067.i01

    Article  Google Scholar 

  6. Berthaut F, Desainte-Catherine M, Hachet M (2010) Combining audiovisual mappings for 3D musical interaction. In: Int computer music conf. New York, USA, ICMC ’10, p 9

  7. Borgo R, Kehrer J, Chung DHS, Maguire E, Laramee RS, Hauser H, Ward M, Chen M (2012) Glyph-based visualization: foundations, design guidelines, techniques and applications. Eurographics 2013 - State of the Art Reports p 25 pages https://doi.org/10.2312/CONF/EG2013/STARS/039-063

  8. Box GEP, Cox DR (1964) An analysis of transformations. J Royal Stat Soc Series B 26(2):211–252. http://www.jstor.org/stable/2984418, Accessed 2019-11-29

    MATH  Google Scholar 

  9. Brazil E, Fernstrom M (2003) Audio information browsing with the Sonic Browser. In: Proc Coord and Mult Views Conf, vol 2003, pp 26–31

  10. Bryer J (2019) likert: analysis and visualization likert items. http://github.com/jbryer/likert

  11. Callaghan TC (1989) Interference and dominance in texture segregation: hue, geometric form, and line orientation. Percept Psychophys 46(4):299–311

    Article  Google Scholar 

  12. Cant JS, Large ME, McCall L, Goodale MA (2008) Independent processing of form, colour, and texture in object perception. Perception 37(1):57–78

    Article  Google Scholar 

  13. Chen M, Floridi L (2013) An analysis of information visualisation. Synthese 190(16):3421–3438. https://doi.org/10.1007/s11229-012-0183-y

    Article  Google Scholar 

  14. Engel J, Resnick C, Roberts A, Dieleman S, Norouzi M, Eck D, Simonyan K (2017) Neural audio synthesis of musical notes with wavenet autoencoders. In: Proc 34th int conf on mach learn - vol 70, JMLR.org, ICML’17, pp 1068–1077

  15. Font F (2010) Design and evaluation of a visualization interface for querying large unstructured sound databases Master’s thesis. Universitat Pompeu Fabra, Barcelona

    Google Scholar 

  16. Font F, Bandiera G (2017) Freesound explorer: make music while discovering freesound! In: Web Audio Conf. WAC 2017. London

  17. Font F, Roma G, Serra X (2013) Freesound technical demo. In: Proc 21st ACM int conf on multimedia MM ’13. https://doi.org/10.1145/2502081.2502245. ACM Press, Barcelona, pp 411–412

  18. Frisson C, Dupont S, Yvart W, Riche N, Siebert X, Dutoit T (2014) Audiometro: directing search for sound designers through content-based cues. In: Proc of audio mostly 9 AM ’14. ACM, New York, pp 1:1–1:8, DOI https://doi.org/10.1145/2636879.2636880, (to appear in print)

  19. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks

  20. Giannakis K (2006) A comparative evaluation of auditory-visual mappings for sound visualisation. Organised Sound; Cambridge 11(3):297–307

    Article  Google Scholar 

  21. Grill T (2012) Constructing high-level perceptual audio descriptors for textural sounds. In: Proc. of the 9th sound and music comput. conf. (SMC 2012), Copenhagen, pp 486–493

  22. Grill T, Flexer A (2012) Visualization of perceptual qualities in textural sounds. In: Int computer music conf, ICMC ’12

  23. Grill T, Flexer A, Cunningham S (2011) Identification of perceptual qualities in textural sounds using the repertory grid method. In: Proc 6th audio mostly conf AM ’11. ACM Press, Coimbra, pp 67–74, DOI https://doi.org/10.1145/2095667.2095677, (to appear in print)

  24. Heise S, Hlatky M, Loviscach J (2008) Soundtorch: quick browsing in large audio collections. In: Proc 125th conv of the audio eng soc (2008), Paper 7544, p 8

  25. Heise S, Hlatky M, Loviscach J (2009) Aurally and visually enhanced audio search with soundtorch. In: CHI ’09 extended abstracts on human factors in computing systems CHI EA ’09. ACM, New York, pp 3241–3246, DOI https://doi.org/10.1145/1520340.1520465, (to appear in print)

  26. Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2019) Forecast: forecasting functions for time series and linear models. http://pkg.robjhyndman.com/forecast, Accessed 2019-11-29

  27. Jin X, Han J (2010) K-medoids clustering. In: Sammut C, Webb GI (eds) Enciclopedia of machine learning. Springer, Boston, pp 564–565

  28. Phillips K (2011) Toxiclibs.js - open-source library for computational design. www.haptic-data.com/toxiclibsjs, Accessed 2019-11-29

  29. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J of the Am Stat Assoc 47 (260):583–621. https://doi.org/10.1080/01621459.1952.10483441

    Article  MATH  Google Scholar 

  30. Kuznetsova A, Brockhoff PB, Christensen RHB (2017) lmerTest package: tests in linear mixed effects models. J of Stat Softw 82(13):1–26. https://doi.org/10.18637/jss.v082.i13

    Article  Google Scholar 

  31. Lange K, Kühn S, Filevich E (2015) Just another tool for online studies (JATOS): an easy solution for setup and management of web servers supporting online studies. PLOS ONE 10(6):1–14. https://doi.org/10.1371/journal.pone.0130834

    Article  Google Scholar 

  32. de Leeuw JR (2015) jsPsych: a JavaScript library for creating behavioral experiments in a Web browser. Behav Res Methods 47(1):1–12

    Article  Google Scholar 

  33. Lenth R (2019) emmeans: estimated marginal means, aka least-squares means. https://CRAN.R-project.org/package=emmeans, Accessed 2019-11-29

  34. Li Y, Fang C, Yang J, Wang Z, Lu X, Yang MH (2017) Universal style transfer via feature transforms. Adv Neural Inf Process Syst 30:386–396

    Google Scholar 

  35. van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-SNE. J Mach Learn Res 1 (9):2579–2605

    MATH  Google Scholar 

  36. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60. www.jstor.org/stable/2236101

    Article  MathSciNet  Google Scholar 

  37. Martin J, de Adana DDR, Asuero AG (2017) Fitting models to data: residual analysis, a primer. In: Hessling JP (ed) Uncertainty quantification and model calibration. chap 7. IntechOpen, Rijeka, DOI https://doi.org/10.5772/68049, (to appear in print)

  38. McAdams S, Winsberg S, Donnadieu S, De Soete G, Krimphoff J (1995) Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psych Res 58(3):177–192

    Article  Google Scholar 

  39. McCarthy L (2013) p5.js | home. www.p5js.org/, Accessed 2019-11-29

  40. McDonald K, Tan M (2018) The infinite drum machine. https://experiments.withgoogle.com/drum-machine, Accessed 2020-01-28

  41. McInnes L, Healy J, Saul N, Grossberger L (2018) Umap: uniform manifold approximation and projection. J Open Source Softw 3(29):861. https://doi.org/10.21105/joss.00861

    Article  Google Scholar 

  42. Mörchen F, Ultsch A, Nöcker M, Stamm C (2005) Databionic visualization of music collections according to perceptual distance. In: Int Soc Music Info Retrieval, ISMIR ’05

  43. Pampalk E, Rauber A, Merkl D (2002) Content-based organization and visualization of music archives. In: MULTIMEDIA ’02

  44. R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  45. Richan E, Rouat J (2019) A study comparing shape, colour and texture as visual labels in audio sample browsers. In: Proc 14th int audio mostly conf: a journey in sound, AM’19. ACM, pp 223–226, DOI https://doi.org/10.1145/3356590.3356624, (to appear in print)

  46. Richan E, Rouat J (2019) Timbre visualisation study - supplementary materials. https://doi.org/10.17605/OSF.IO/FKNHR, https://osf.io/fknhr, Accessed 2019-11-29

  47. Roma G, Green O, Tremblay PA (2019) Adaptive mapping of sound collections for data-driven musical interfaces. In: New Interfaces musical expression, NIME ’19

  48. Schwarz D, Schnell N (2010) Sound search by content-based navigation in large databases. In: Proc 6th sound music computing conf, SMC ’09

  49. Schwarz D, Beller G, Verbrugghe B, Britton S (2006) Real-time corpus-based concatenative synthesis with CataRT. In: Proc 9th int conf on digital audio effects, DAFx-06

  50. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:https://arxiv.org/abs/14091556 [cs]

  51. Soraghan S (2014) Animating timbre - a user study. In: Int computer music conf, ICMC ’14

  52. Stober S, Nürnberger A (2010) Musicgalaxy: a multi-focus zoomable interface for multi-facet exploration of music collections. In: CMMR 2010

  53. Stober S, Low T, Gossen T, Nürnberger A (2013) Incremental visualization of growing music collections. In: Int Soc Music Info Retrieval, ISMIR ’13

  54. Walker R (1987) The effects of culture, environment, age, and musical training on choices of visual metaphors for sound. Percept Psychophys 42(5):491–502. https://doi.org/10.3758/BF03209757

    Article  Google Scholar 

  55. Ward MO (2008) Multivariate data glyphs: principles and practice. In: Handbook of data vis. Springer, Berlin, pp 179–198, DOI https://doi.org/10.1007/978-3-540-33037-0_8, (to appear in print)

  56. Wasserstein RL, Lazar NA (2016) The ASA statement on p-values: context, process, and purpose. Am Stat 70(2):129–133. https://doi.org/10.1080/00031305.2016.1154108

    Article  MathSciNet  Google Scholar 

  57. Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York. https://ggplot2.tidyverse.org

    Book  Google Scholar 

  58. Wickham H, François R, Henry L, Müller K (2019) dplyr: a grammar of data manipulation. https://CRAN.R-project.org/package=dplyr

  59. Wolfgang K (1947) Gestalt psychology: an introduction to new concepts in modern psychology. New York, New York

    Google Scholar 

  60. XLN Audio (2019) XO - XLN audio. https://www.xlnaudio.com/products/xo, Accessed 2020-01-28

Download references

Acknowledgments

We thank all of our participants for taking the time to complete our study. We also thank our reviewers for their constructive feedback. Thanks to members of the NECOTIS laboratory of the University of Sherbrooke who beta-tested the study and provided feedback. We thank CIRMMT for providing access to their research infrastucture and travel funding. We also thank Frédéric Lavoie and and the GRPA of the University of Sherbrooke for generously lending us their testing facilities. Special thanks to Felix Camirand Lemyre for his advice on statistical modeling and analysis.

Funding

This work is partly funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds Nature et Technologies of Quebec (FRQNT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Etienne Richan.

Ethics declarations

The studies conducted were approved by the Comité d’éthique de la recherche - Lettres et sciences humaines of the University of Sherbrooke (ethical certificate number 2018-1795).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 162 KB)

Appendix: : R packages

Appendix: : R packages

We use R [44] for our data analysis and figures. We use forecast [26] to estimate the optimal Box-Cox transform parameters as well as perform the forward and inverse transformations. We use general linear mixed-effect models from lme4 [5] and lmerTest [30]. Estimated marginal means and confidence intervals of fitted models are calculated with emmeans [33]. Figures were produced with ggplot2 [57] and likert [10]. dplyr [58] is used for data wrangling.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richan, E., Rouat, J. A proposal and evaluation of new timbre visualization methods for audio sample browsers. Pers Ubiquit Comput 25, 723–736 (2021). https://doi.org/10.1007/s00779-020-01388-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-020-01388-1

Keywords

Navigation