Skip to main content

Sound Sharing and Retrieval

Abstract

Multimedia sharing has experienced an enormous growth in recent years, and sound sharing has not been an exception. Nowadays one can find online sound sharing sites in which users can search, browse, and contribute large amounts of audio content such as sound effects, field and urban recordings, music tracks, and music samples. This poses many challenges to enable search, discovery, and ultimately reuse of this content. In this chapter we give an overview of different ways to approach such challenges. We describe how to build an audio database by outlining different aspects to be taken into account. We discuss metadata-based descriptions of audio content and different searching and browsing techniques that can be used to navigate the database. In addition to metadata, we show sound retrieval techniques based on the extraction of audio features from (possibly) unannotated audio. We end the chapter by discussing advanced approaches to sound retrieval and by drawing some conclusions about present and future of sound sharing and retrieval. In addition to our explanations, we provide code examples that illustrate some of the concepts discussed.

Keywords

  • Sound sharing
  • Sound retrieval
  • Multimedia
  • Audio metadata
  • Sound description
  • Audio database
  • Audio indexing
  • Audio features
  • Similarity search
  • Query by example
  • Sound taxonomy
  • Machine learning
  • Sound exploration
  • Sound search

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-63450-0_10
  • Chapter length: 23 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-63450-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 10.1
Fig. 10.2
Fig. 10.3
Fig. 10.4

Notes

  1. 1.

    https://freesound.org, https://looperman.com, http://ccmixter.org, http://aporee.org/maps.

  2. 2.

    https://sounddogs.com, https://soundsnap.com, https://asoundeffect.com.

  3. 3.

    https://python.org.

  4. 4.

    www.TODO:bookwebsite.

  5. 5.

    https://freesound.org/docs/api/resources_apiv2.html.

  6. 6.

    https://creativecommons.org.

  7. 7.

    https://github.com/MTG/essentia/tree/master/src/examples/freesound.

  8. 8.

    https://youtube.com, https://vimeo.com, https://flickr.com, https://soundcloud.com, https://bandcamp.com, https://last.fm, https://freesound.org.

  9. 9.

    For this reason, histograms are provided as part of the documentation of the Freesound API: https://www.freesound.org/docs/api/analysis_docs.html.

  10. 10.

    https://labs.freesound.org/floop/.

  11. 11.

    https://ffont.github.io/freesound-explorer/.

References

  1. Angeletou, S., Sabou, M., Motta, E.: Semantically enriching folksonomies with FLOR. In: Proceedings of the European Semantic Web Conference (ESWC) (2008)

    Google Scholar 

  2. Aucouturier, J.J., Sandler, M.: Finding repeating patterns in acoustic musical signals: applications for audio thumbnailing. In: Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio. Audio Engineering Society, New York (2002)

    Google Scholar 

  3. Aucouturier, J.J., Defreville, B., Pachet, F.: The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J. Acoust. Soc. Am. 122(2), 881–891 (2007)

    CrossRef  Google Scholar 

  4. Azizyan, M., Constandache, I., Roy Choudhury, R.: Surroundsense: mobile phone localization via ambience fingerprinting. In: Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom), pp. 261–272. ACM, New York (2009)

    Google Scholar 

  5. Bischoff, K., Firan, C.S., Nejdl, W., Paiu, R.: Can all tags be used for search? In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 193–202 (2008)

    Google Scholar 

  6. Blancas, D.S., Janer, J.: Sound retrieval from voice imitation queries in collaborative databases. In: Proceedings of the AES Conference on Semantic Audio. Audio Engineering Society, New York (2014)

    Google Scholar 

  7. Bodner, R.C., Song, F.: Knowledge-based approaches to query expansion in information retrieval. In: Proceedings of the Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI), pp. 146–158. Springer, New York (1996)

    Google Scholar 

  8. Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J.R., Serra, X.: Essentia: an audio analysis library for music information retrieval. In: Proceedings of the International Music Information Retrieval Conference (ISMIR), pp. 493–498 (2013)

    Google Scholar 

  9. Brazil, E., Fernstroem, M., Tzanetakis, G., Cook, P.: Enhancing sonic browsing using audio information retrieval. In: Proceedings of the International Conference on Auditory Display (ICAD), Kyoto, pp. 132–135 (2002)

    Google Scholar 

  10. Brossier, P.M.: The aubio library at MIREX 2006. In: Proceedings of the Music Information Retrieval Evaluation Exchange (MIREX), p. 1 (2006)

    Google Scholar 

  11. Bullock, J., Conservatoire, U.: Libxtract: a lightweight library for audio feature extraction. In: Proceedings of the International Computer Music Conference (ICMC), pp. 22–28 (2007)

    Google Scholar 

  12. Cano, P., Batlle, E., Kalker, T., Haitsma, J.: A review of audio fingerprinting. J. VLSI Signal Process. Syst. 41(3), 271–284 (2005)

    CrossRef  Google Scholar 

  13. Cano, P., Koppenberger, M., Wack, N.: An industrial-strength content-based music recommendation system. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, p. 673 (2005)

    Google Scholar 

  14. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)

    Google Scholar 

  15. Cartwright, M., Pardo, B.: Vocalsketch: Vocally imitating audio concepts. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 43–46. ACM, New York (2015)

    Google Scholar 

  16. Casey, M.A.: Acoustic lexemes for organizing internet audio. Contemp. Music Rev. 24(6), 489–508 (2005)

    CrossRef  Google Scholar 

  17. Comajuncosas, J.M., Barrachina, A., O’Connell, J., Guaus, E.: Nuvolet: 3d gesture-driven collaborative audio mosaicing. In: Proceedings of the New Interfaces for Musical Expression Conference (NIME), pp. 252–255 (2011)

    Google Scholar 

  18. Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce. Springer, New York (2001)

    CrossRef  MATH  Google Scholar 

  19. Font, F.: Tag Recommendation using Folksonomy information for online sound sharing platforms. Ph.D. thesis, Universitat Pompeu Fabra (2015)

    Google Scholar 

  20. Foote, J.: An overview of audio information retrieval. Multimed. Syst. 7(1), 2–10 (1999)

    CrossRef  Google Scholar 

  21. Foote, J., Uchihashi, S.: The beat spectrum: a new approach to rhythm analysis. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) (2001)

    Google Scholar 

  22. Gaver, W.W.: What in the world do we hear?: An ecological approach to auditory event perception. Ecol. Psychol. 5(1), 1–29 (1993)

    CrossRef  Google Scholar 

  23. Ghias, A., Logan, J., Chamberlin, D., Smith, B.C.: Query by humming: musical information retrieval in an audio database. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 231–236. ACM, New York (1995)

    Google Scholar 

  24. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198–208 (2006)

    CrossRef  Google Scholar 

  25. Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Netw. 14(1), 209–215 (2003)

    MathSciNet  CrossRef  Google Scholar 

  26. Guy, M., Tonkin, E.: Folksonomies: tidying up tags? D-Lib Mag. 12(1) (2006)

    Google Scholar 

  27. Halpin, H., Robu, V., Shepard, H.: The dynamics and semantics of collaborative tagging. In: Proceedings of the Semantic Authoring and Annotation Workshop (SAAW), pp. 1–21 (2006)

    Google Scholar 

  28. Heise, S., Hlatky, M., Loviscach, J.: Soundtorch: quick browsing in large audio collections. In: Proceedings of the 125th AES Convention. Audio Engineering Society (2008)

    Google Scholar 

  29. Huber, D.M., Runstein, R.E.: Modern Recording Techniques. Taylor & Francis, London (2013)

    Google Scholar 

  30. Jaitly, N., Hinton, G.: Learning a better representation of speech soundwaves using restricted boltzmann machines. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5887. IEEE, New York (2011)

    Google Scholar 

  31. Jang, D., Jin, M., Lee, J.S., Lee, S., Lee, S., Seo, J.S., Yoo, C.D.: Automatic commercial monitoring for TV broadcasting using audio fingerprinting. In: Proceedings of the AES Conference on Audio for Mobile and Handheld Devices. Audio Engineering Society, New York (2006)

    Google Scholar 

  32. Jeffries, A.: The man behind Flickr on making the service ‘awesome again’ (2013). http://www.theverge.com/2013/3/20/4121574/flickr-chief-markus-spiering-talks-photos-and-marissa-mayer. Last accessed 15 Nov 2016

  33. Kaser, O., Lemire, D.: Tag-cloud drawing: algorithms for cloud visualization. In: Proceedings of the International World Wide Web Conference (WWW) (2007)

    Google Scholar 

  34. Krumm, J., Davies, N., Narayanaswami, C.: User-generated content. IEEE Pervasive Comput. 10–11 (2008)

    Google Scholar 

  35. Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 658–665. ACM, New York (2004)

    Google Scholar 

  36. Lartillot, O., Toiviainen, P., Eerola, T.: A MATLAB toolbox for music information retrieval. In: Proceedings of the Data analysis, Machine Learning and Applications Conference, pp. 261–268. Springer, Berlin, Heidelberg (2008)

    Google Scholar 

  37. Lee, K., Ellis, D.P.W.: Audio-based semantic concept classification for consumer video. IEEE Audio Speech Language Process. 18(6), 1406–1416 (2010)

    CrossRef  Google Scholar 

  38. Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Proceedings of the Neural Information Processing Systems (NIPS), pp. 1096–1104 (2009)

    Google Scholar 

  39. Lessing, L.: Remix: Making Art and Commerce Thrive in the Hybrid Economy. Penguin Press, Harmondsworth (2008)

    CrossRef  Google Scholar 

  40. Limpens, F., Gandon, F.L., Buffa, M.: Linking folksonomies and ontologies for supporting knowledge sharing: a state of the art. Tech. rep., Institut National de Recherche en Informatique et Automatique (INRIA) (2009)

    Google Scholar 

  41. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  42. Macgregor, G., Mcculloch, E.: Collaborative tagging as a knowledge organisation and resource discovery tool. Libr. Rev. 55(5), 291–300 (2006)

    CrossRef  Google Scholar 

  43. Marcell, M.M., Borella, D., Greene, M., Kerr, E., Rogers, S.: Confrontation naming of environmental sounds. J. Clin. Exp. Neuropsychol. 22(6), 830–864 (2000)

    CrossRef  Google Scholar 

  44. Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, Tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the ACM Conference on Hypertext and Hypermedia (Hypertext), pp. 31–41 (2006)

    Google Scholar 

  45. Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the International Music Information Retrieval Conference (ISMIR) (2010)

    Google Scholar 

  46. McFee, B., Raffel, C., Liang, D.: librosa: Audio and music signal analysis in python. In: Proceedings of the Python in Science Conference (SciPy) (2015)

    Google Scholar 

  47. Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15 (2007)

    Google Scholar 

  48. Nagypál, G.: Improving information retrieval effectiveness by using domain knowledge stored in ontologies. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 780–789. Springer, New York (2005)

    Google Scholar 

  49. Nakatani, T., Okuno, H.G.: Sound ontology for computational auditory scene analysis. In: Proceedings of the Innovative Applications of Artificial Intelligence Conference (IAAI), pp. 1004–1010 (1998)

    Google Scholar 

  50. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab (1999)

    Google Scholar 

  51. Pampalk, E., Rauber, A., Merkl, D.: Content-based organization and visualization of music archives. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 570–579. ACM, New York (2002)

    Google Scholar 

  52. Pampalk, E., Hlavac, P., Herrera, P.: Hierarchical organization and visualization of drum sample libraries. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), Naples, pp. 378–383 (2004)

    Google Scholar 

  53. Passant, A., Laublet, P., Breslin, J.G., Decker, S.: A URI is worth a thousand tags: from tagging to linked data with MOAT. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, p. 279 (2011)

    Google Scholar 

  54. Pedregosa, F., Varoquaux, G.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  55. Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Tech. rep., IRCAM (2004)

    Google Scholar 

  56. Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40(3), 56–58 (1997)

    CrossRef  Google Scholar 

  57. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)

    CrossRef  Google Scholar 

  58. Robu, V., Halpin, H., Shepherd, H.: Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans. Web 3(4) (2009)

    Google Scholar 

  59. Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. In: Proceedings of the OTM Confederated International Conferences - On the Move to Meaningful Internet Systems, pp. 1105–1114. Springer, New York (2007)

    Google Scholar 

  60. Roma, G.: Algorithms and representations for supporting online music creation with large-scale audio databases. Ph.D. thesis, Universitat Pompeu Fabra (2015)

    Google Scholar 

  61. Roma, G., Serra, X.: Music performance by discovering community loops. In: Proceedings of the Web Audio Conference (WAC), Paris (2015)

    Google Scholar 

  62. Roma, G., Serra, X.: Querying Freesound with a microphone. In: Proceedings of the Web Audio Conference (WAC) (2015)

    Google Scholar 

  63. Roma, G., Janer, J., Kersten, S., Schirosa, M., Herrera, P., Serra, X.: Ecological acoustics perspective for content-based retrieval of environmental sounds. EURASIP J. Audio Speech Music Process. 2010, 1–11 (2010)

    CrossRef  Google Scholar 

  64. Salamon, J., Bello, J.P.: Feature learning with deep scattering for urban sound analysis. In: Signal Processing Conference (EUSIPCO), 2015 23rd European, pp. 724–728. IEEE, New York (2015)

    Google Scholar 

  65. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia (MM), pp. 1041–1044 (2014)

    Google Scholar 

  66. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 285–295. ACM, New York (2001)

    Google Scholar 

  67. Schwarz, D.: Corpus-based concatenative synthesis. IEEE Signal Process. Mag. 24(2), 92–104 (2007)

    CrossRef  Google Scholar 

  68. Schwarz, D., Cahen, R., Britton, S.: Principles and applications of interactive corpus-based concatenative synthesis. J. d’Informatique Musicale 1 (2008)

    Google Scholar 

  69. Schwarz, D., Schnell, N.: Sound search by content-based navigation in large databases. In: Proceedings of the Sound and Music Computing Conference (SMC), p. 1 (2009)

    Google Scholar 

  70. Sen, S., Lam, S., Rashid, A., Cosley, D.: Tagging, communities, vocabulary, evolution. In: Proceedings of the Conference on Community Supported Cooperative Work (CSCW), pp. 181–190 (2006)

    Google Scholar 

  71. Shirky, C.: Ontology is overrated: Categories, links, and tags (2005). http://www.shirky.com/writings/ontology_overrated.html. Last accessed 15 Nov 2016

  72. Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Commun. Data Eng. 24(4), 35–43 (2001)

    Google Scholar 

  73. Smith, T.: The social media revolution. Int. J. Mark. Res. 51(4), 559–561 (2009)

    CrossRef  Google Scholar 

  74. Sood, S.C., Owsley, S.H., Hammond, K.J., Birnbaum, L.: TagAssist: automatic tag suggestion for blog posts. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM), pp. 1–8 (2007)

    Google Scholar 

  75. The YouTube Team: Here’s to eight great years (2013). http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html. Last accessed 15 Nov 2016

  76. Tunkelang, D.: Faceted search. Synth. Lect. Inf. Concepts Retr. Serv. 1(1), 1–80 (2009)

    Google Scholar 

  77. Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Organised Sound 4, 169–175 (2000)

    CrossRef  Google Scholar 

  78. Wagner, C., Strohmaier, M., Huberman, B.: Semantic stability and implicit consensus in social tagging streams. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 735–746 (2014)

    Google Scholar 

  79. Wahlforss, A.L.: SoundCloud is 5! (2013). http://blog.soundcloud.com/2013/11/13/soundcloud-is-5/. Last accessed 15 Nov 2016

  80. Wikipedia: Remix culture (2014). https://en.wikipedia.org/wiki/Remix_culture. Last accessed 15 Nov 2016

  81. Zils, A., Pachet, F.: Musical mosaicing. In: Proceedings of the International Conference on Digital Audio Effects (DAFx), p. 135 (2001)

    Google Scholar 

  82. Zlatintsi, A., Maragos, P., Potamianos, A., Evangelopoulos, G.: A saliency-based approach to audio event detection and summarization. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1294–1298. IEEE, New York (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederic Font .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Font, F., Roma, G., Serra, X. (2018). Sound Sharing and Retrieval. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63450-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63449-4

  • Online ISBN: 978-3-319-63450-0

  • eBook Packages: EngineeringEngineering (R0)