A proposal and evaluation of new timbre visualization methods for audio sample browsers

Richan, Etienne; Rouat, Jean

doi:10.1007/s00779-020-01388-1

A proposal and evaluation of new timbre visualization methods for audio sample browsers

Original Article
Published: 19 April 2020

Volume 25, pages 723–736, (2021)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

363 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Searching through vast libraries of sound samples can be a daunting and time-consuming task. Modern audio sample browsers use mappings between acoustic properties and visual attributes to visually differentiate displayed items. There are few studies focused on how well these mappings help users search for a specific sample. We propose new methods for generating textural labels and positioning samples based on perceptual representations of timbre. We perform a series of studies to evaluate the benefits of using shape, color, or texture as labels in a known-item search task. We describe the motivation and implementation of the study, and present an in-depth analysis of results. We find that shape significantly improves task performance, while color and texture have little effect. We also compare results between in-person and online participants and propose research directions for further studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio-Tokens: A toolbox for rating, sorting and comparing audio samples in the browser

Article Open access 16 March 2022

Sound Sharing and Retrieval

PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio

Article Open access 15 July 2022

Notes

Available for download from https://github.com/NECOTIS/ERBlet-Cochlear-Filterbank
For a linear regression model to be considered appropriate, the distribution of prediction errors (residuals) should resemble a normal distribution [37].
Available for download from https://github.com/NECOTIS/timbre-visualisation-study

References

Abdelmounaime S, Dong-Chen H (2013) New Brodatz-based image databases for grayscale color and multiband texture analysis. Int Sch Res Not Machine Vision 2013:1–14. https://doi.org/10.1155/2013/876386
Article Google Scholar
Adeli M, Rouat J, Molotchnikoff S (2014) Audiovisual correspondence between musical timbre and visual shapes. Front Hum Neurosci, 8. https://doi.org/10.3389/fnhum.2014.00352
Adeli M, Rouat J, Wood S, Molotchnikoff S, Plourde E (2016) A flexible bio-inspired hierarchical model for analyzing musical timbre. IEEE/ACM Trans Audio Speech Language Process 24(5):875–889. https://doi.org/10.1109/TASLP.2016.2530405
Article Google Scholar
Ahlberg C, Shneiderman B Visual information seeking: tight coupling of dynamic query filters with starfield displays. In: Readings in human-computer interaction, interactive technologies. Morgan Kaufmann, pp 450–456
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J of Stat Softw 67(1):1–48. https://doi.org/10.18637/jss.v067.i01
Article Google Scholar
Berthaut F, Desainte-Catherine M, Hachet M (2010) Combining audiovisual mappings for 3D musical interaction. In: Int computer music conf. New York, USA, ICMC ’10, p 9
Borgo R, Kehrer J, Chung DHS, Maguire E, Laramee RS, Hauser H, Ward M, Chen M (2012) Glyph-based visualization: foundations, design guidelines, techniques and applications. Eurographics 2013 - State of the Art Reports p 25 pages https://doi.org/10.2312/CONF/EG2013/STARS/039-063
Box GEP, Cox DR (1964) An analysis of transformations. J Royal Stat Soc Series B 26(2):211–252. http://www.jstor.org/stable/2984418, Accessed 2019-11-29
MATH Google Scholar
Brazil E, Fernstrom M (2003) Audio information browsing with the Sonic Browser. In: Proc Coord and Mult Views Conf, vol 2003, pp 26–31
Bryer J (2019) likert: analysis and visualization likert items. http://github.com/jbryer/likert
Callaghan TC (1989) Interference and dominance in texture segregation: hue, geometric form, and line orientation. Percept Psychophys 46(4):299–311
Article Google Scholar
Cant JS, Large ME, McCall L, Goodale MA (2008) Independent processing of form, colour, and texture in object perception. Perception 37(1):57–78
Article Google Scholar
Chen M, Floridi L (2013) An analysis of information visualisation. Synthese 190(16):3421–3438. https://doi.org/10.1007/s11229-012-0183-y
Article Google Scholar
Engel J, Resnick C, Roberts A, Dieleman S, Norouzi M, Eck D, Simonyan K (2017) Neural audio synthesis of musical notes with wavenet autoencoders. In: Proc 34th int conf on mach learn - vol 70, JMLR.org, ICML’17, pp 1068–1077
Font F (2010) Design and evaluation of a visualization interface for querying large unstructured sound databases Master’s thesis. Universitat Pompeu Fabra, Barcelona
Google Scholar
Font F, Bandiera G (2017) Freesound explorer: make music while discovering freesound! In: Web Audio Conf. WAC 2017. London
Font F, Roma G, Serra X (2013) Freesound technical demo. In: Proc 21st ACM int conf on multimedia MM ’13. https://doi.org/10.1145/2502081.2502245. ACM Press, Barcelona, pp 411–412
Frisson C, Dupont S, Yvart W, Riche N, Siebert X, Dutoit T (2014) Audiometro: directing search for sound designers through content-based cues. In: Proc of audio mostly 9 AM ’14. ACM, New York, pp 1:1–1:8, DOI https://doi.org/10.1145/2636879.2636880, (to appear in print)
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks
Giannakis K (2006) A comparative evaluation of auditory-visual mappings for sound visualisation. Organised Sound; Cambridge 11(3):297–307
Article Google Scholar
Grill T (2012) Constructing high-level perceptual audio descriptors for textural sounds. In: Proc. of the 9th sound and music comput. conf. (SMC 2012), Copenhagen, pp 486–493
Grill T, Flexer A (2012) Visualization of perceptual qualities in textural sounds. In: Int computer music conf, ICMC ’12
Grill T, Flexer A, Cunningham S (2011) Identification of perceptual qualities in textural sounds using the repertory grid method. In: Proc 6th audio mostly conf AM ’11. ACM Press, Coimbra, pp 67–74, DOI https://doi.org/10.1145/2095667.2095677, (to appear in print)
Heise S, Hlatky M, Loviscach J (2008) Soundtorch: quick browsing in large audio collections. In: Proc 125th conv of the audio eng soc (2008), Paper 7544, p 8
Heise S, Hlatky M, Loviscach J (2009) Aurally and visually enhanced audio search with soundtorch. In: CHI ’09 extended abstracts on human factors in computing systems CHI EA ’09. ACM, New York, pp 3241–3246, DOI https://doi.org/10.1145/1520340.1520465, (to appear in print)
Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2019) Forecast: forecasting functions for time series and linear models. http://pkg.robjhyndman.com/forecast, Accessed 2019-11-29
Jin X, Han J (2010) K-medoids clustering. In: Sammut C, Webb GI (eds) Enciclopedia of machine learning. Springer, Boston, pp 564–565
Phillips K (2011) Toxiclibs.js - open-source library for computational design. www.haptic-data.com/toxiclibsjs, Accessed 2019-11-29
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J of the Am Stat Assoc 47 (260):583–621. https://doi.org/10.1080/01621459.1952.10483441
Article MATH Google Scholar
Kuznetsova A, Brockhoff PB, Christensen RHB (2017) lmerTest package: tests in linear mixed effects models. J of Stat Softw 82(13):1–26. https://doi.org/10.18637/jss.v082.i13
Article Google Scholar
Lange K, Kühn S, Filevich E (2015) Just another tool for online studies (JATOS): an easy solution for setup and management of web servers supporting online studies. PLOS ONE 10(6):1–14. https://doi.org/10.1371/journal.pone.0130834
Article Google Scholar
de Leeuw JR (2015) jsPsych: a JavaScript library for creating behavioral experiments in a Web browser. Behav Res Methods 47(1):1–12
Article Google Scholar
Lenth R (2019) emmeans: estimated marginal means, aka least-squares means. https://CRAN.R-project.org/package=emmeans, Accessed 2019-11-29
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang MH (2017) Universal style transfer via feature transforms. Adv Neural Inf Process Syst 30:386–396
Google Scholar
van der Maaten L, Hinton G (2008) Visualizing high-dimensional data using t-SNE. J Mach Learn Res 1 (9):2579–2605
MATH Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60. www.jstor.org/stable/2236101
Article MathSciNet Google Scholar
Martin J, de Adana DDR, Asuero AG (2017) Fitting models to data: residual analysis, a primer. In: Hessling JP (ed) Uncertainty quantification and model calibration. chap 7. IntechOpen, Rijeka, DOI https://doi.org/10.5772/68049, (to appear in print)
McAdams S, Winsberg S, Donnadieu S, De Soete G, Krimphoff J (1995) Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psych Res 58(3):177–192
Article Google Scholar
McCarthy L (2013) p5.js | home. www.p5js.org/, Accessed 2019-11-29
McDonald K, Tan M (2018) The infinite drum machine. https://experiments.withgoogle.com/drum-machine, Accessed 2020-01-28
McInnes L, Healy J, Saul N, Grossberger L (2018) Umap: uniform manifold approximation and projection. J Open Source Softw 3(29):861. https://doi.org/10.21105/joss.00861
Article Google Scholar
Mörchen F, Ultsch A, Nöcker M, Stamm C (2005) Databionic visualization of music collections according to perceptual distance. In: Int Soc Music Info Retrieval, ISMIR ’05
Pampalk E, Rauber A, Merkl D (2002) Content-based organization and visualization of music archives. In: MULTIMEDIA ’02
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Richan E, Rouat J (2019) A study comparing shape, colour and texture as visual labels in audio sample browsers. In: Proc 14th int audio mostly conf: a journey in sound, AM’19. ACM, pp 223–226, DOI https://doi.org/10.1145/3356590.3356624, (to appear in print)
Richan E, Rouat J (2019) Timbre visualisation study - supplementary materials. https://doi.org/10.17605/OSF.IO/FKNHR, https://osf.io/fknhr, Accessed 2019-11-29
Roma G, Green O, Tremblay PA (2019) Adaptive mapping of sound collections for data-driven musical interfaces. In: New Interfaces musical expression, NIME ’19
Schwarz D, Schnell N (2010) Sound search by content-based navigation in large databases. In: Proc 6th sound music computing conf, SMC ’09
Schwarz D, Beller G, Verbrugghe B, Britton S (2006) Real-time corpus-based concatenative synthesis with CataRT. In: Proc 9th int conf on digital audio effects, DAFx-06
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:https://arxiv.org/abs/14091556 [cs]
Soraghan S (2014) Animating timbre - a user study. In: Int computer music conf, ICMC ’14
Stober S, Nürnberger A (2010) Musicgalaxy: a multi-focus zoomable interface for multi-facet exploration of music collections. In: CMMR 2010
Stober S, Low T, Gossen T, Nürnberger A (2013) Incremental visualization of growing music collections. In: Int Soc Music Info Retrieval, ISMIR ’13
Walker R (1987) The effects of culture, environment, age, and musical training on choices of visual metaphors for sound. Percept Psychophys 42(5):491–502. https://doi.org/10.3758/BF03209757
Article Google Scholar
Ward MO (2008) Multivariate data glyphs: principles and practice. In: Handbook of data vis. Springer, Berlin, pp 179–198, DOI https://doi.org/10.1007/978-3-540-33037-0_8, (to appear in print)
Wasserstein RL, Lazar NA (2016) The ASA statement on p-values: context, process, and purpose. Am Stat 70(2):129–133. https://doi.org/10.1080/00031305.2016.1154108
Article MathSciNet Google Scholar
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York. https://ggplot2.tidyverse.org
Book Google Scholar
Wickham H, François R, Henry L, Müller K (2019) dplyr: a grammar of data manipulation. https://CRAN.R-project.org/package=dplyr
Wolfgang K (1947) Gestalt psychology: an introduction to new concepts in modern psychology. New York, New York
Google Scholar
XLN Audio (2019) XO - XLN audio. https://www.xlnaudio.com/products/xo, Accessed 2020-01-28

Download references

Acknowledgments

We thank all of our participants for taking the time to complete our study. We also thank our reviewers for their constructive feedback. Thanks to members of the NECOTIS laboratory of the University of Sherbrooke who beta-tested the study and provided feedback. We thank CIRMMT for providing access to their research infrastucture and travel funding. We also thank Frédéric Lavoie and and the GRPA of the University of Sherbrooke for generously lending us their testing facilities. Special thanks to Felix Camirand Lemyre for his advice on statistical modeling and analysis.

Funding

This work is partly funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds Nature et Technologies of Quebec (FRQNT).

Author information

Authors and Affiliations

NECOTIS GEGI, CIRMMT, Université de Sherbrooke, Sherbrooke, Canada
Etienne Richan & Jean Rouat

Authors

Etienne Richan
View author publications
You can also search for this author in PubMed Google Scholar
Jean Rouat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Etienne Richan.

Ethics declarations

The studies conducted were approved by the Comité d’éthique de la recherche - Lettres et sciences humaines of the University of Sherbrooke (ethical certificate number 2018-1795).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 162 KB)

Appendix: : R packages

We use R [44] for our data analysis and figures. We use forecast [26] to estimate the optimal Box-Cox transform parameters as well as perform the forward and inverse transformations. We use general linear mixed-effect models from lme4 [5] and lmerTest [30]. Estimated marginal means and confidence intervals of fitted models are calculated with emmeans [33]. Figures were produced with ggplot2 [57] and likert [10]. dplyr [58] is used for data wrangling.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richan, E., Rouat, J. A proposal and evaluation of new timbre visualization methods for audio sample browsers. Pers Ubiquit Comput 25, 723–736 (2021). https://doi.org/10.1007/s00779-020-01388-1

Download citation

Received: 09 December 2019
Accepted: 05 March 2020
Published: 19 April 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00779-020-01388-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A proposal and evaluation of new timbre visualization methods for audio sample browsers

Abstract

Access this article

Similar content being viewed by others

Audio-Tokens: A toolbox for rating, sorting and comparing audio samples in the browser

Sound Sharing and Retrieval

PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Electronic supplementary material

(PDF 162 KB)

Appendix: : R packages

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A proposal and evaluation of new timbre visualization methods for audio sample browsers

Abstract

Access this article

Similar content being viewed by others

Audio-Tokens: A toolbox for rating, sorting and comparing audio samples in the browser

Sound Sharing and Retrieval

PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Electronic supplementary material

(PDF 162 KB)

Appendix: : R packages

Appendix: : R packages

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation