Multimedia Tools and Applications

, Volume 58, Issue 1, pp 23–40 | Cite as

A fast audio similarity retrieval method for millions of music tracks

  • Dominik SchnitzerEmail author
  • Arthur Flexer
  • Gerhard Widmer


We present a filter-and-refine method to speed up nearest neighbor searches with the Kullback–Leibler divergence for multivariate Gaussians. This combination of features and similarity estimation is of special interest in the field of automatic music recommendation as it is widely used to compute music similarity. However, the non-vectorial features and a non-metric divergence make using it with large corpora difficult, as standard indexing algorithms can not be used. This paper proposes a method for fast nearest neighbor retrieval in large databases which relies on the above approach. In its core the method rescales the divergence and uses a modified FastMap implementation to speed up nearest-neighbor queries. Overall the method accelerates the search for similar music pieces by a factor of 10–30 and yields high recall values of 95–99% compared to a standard linear search.


Audio Indexing Music recommendation 



This research is supported by the Austrian Research Fund (FWF) under grant L511-N15, and by the Austrian Research Promotion Agency (FFG) under project number 815474-BRIDGE.


  1. 1.
    Andoni A, Indyk P, MIT C (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th annual IEEE symposium on foundations of computer science, 2006. FOCS’06, pp 459–468Google Scholar
  2. 2.
    Athitsos V, Alon J, Sclaroff S, Kollios G (2004) BoostMap: a method for efficient approximate similarity rankings. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, vol 2Google Scholar
  3. 3.
    Athitsos V, Potamias M, Papapetrou P, Kollios G (2008) Nearest neighbor retrieval using distance-based hashing. In: IEEE 24th international conference on data engineering, ICDE 2008, pp 327–336Google Scholar
  4. 4.
    Bentley J (1975) Multidimensional binary search trees used for associative searching. ACM, New York, NY, USAGoogle Scholar
  5. 5.
    Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ‘nearest neighbor’ meaningful? In: Proceedings of the 7th international conference on database theory. Springer, London, UK, pp 217–235Google Scholar
  6. 6.
    Burges C, Platt J, Jana S (2003) Distortion discriminant analysis for audio fingerprinting. IEEE Trans Speech Audio Process 11(3):165–174CrossRefGoogle Scholar
  7. 7.
    Cai R, Zhang C, Zhang L, Ma W (2007) Scalable music recommendation by search. In: Proceedings of the 15th international conference on multimedia. ACM, New York, NY, USA, pp 1065–1074CrossRefGoogle Scholar
  8. 8.
    Cano P, Kaltenbrunner M, Gouyon F, Batlle E (2002) On the use of FastMap for audio retrieval and browsing. In: Proc int conf music information retrieval (ISMIR), pp 275–276Google Scholar
  9. 9.
    Cano P, Koppenberger M, Wack N (2005) An industrial-strength content-based music recommendation system. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 673–673Google Scholar
  10. 10.
    Casey M, Slaney M (2006) Song intersection by approximate nearest neighbor search. In: Proc ISMIR, pp 144–149Google Scholar
  11. 11.
    Cox T, Cox M (2001) Multidimensional scaling. CRC PressGoogle Scholar
  12. 12.
    Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255CrossRefGoogle Scholar
  13. 13.
    Endres D, Schindelin J (2003) A new metric for probability distributions. IEEE Trans Inf Theory 49(7):1858–1860MathSciNetCrossRefGoogle Scholar
  14. 14.
    Faloutsos C, Lin K (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 163–174CrossRefGoogle Scholar
  15. 15.
    Fastl H, Zwicker E (2007) Psychoacoustics: facts and models. Springer, New YorkGoogle Scholar
  16. 16.
    Flexer A (2007) A closer look on artist filters for musical genre classification. In: Proceedings of the international symposium on music information retrieval, Vienna, AustriaGoogle Scholar
  17. 17.
    Flexer A, Schnitzer D (2010) Effects of album and artist filters in audio similarity computed for very large music databases. Comput Music J 34(3):20–28CrossRefGoogle Scholar
  18. 18.
    Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2008. CVPR Workshops 2008, pp 1–6Google Scholar
  19. 19.
    Homburg H, Mierswa I, Möller B, Morik K, Wurst M (2005) A benchmark dataset for audio classification and clustering. In: Proceedings of the international conference on music information retrieval, pp 528–31Google Scholar
  20. 20.
    Jensen J, Christensen M, Ellis D, Jensen S (2009) Quantitative analysis of a common audio similarity measure. IEEE Trans Audio Speech Lang Process 17(4):693–703CrossRefGoogle Scholar
  21. 21.
    Levy M, Sandler M (2006) Lightweight measures for timbral similarity of musical audio. In: Proceedings of the 1st ACM workshop on audio and music computing multimedia. ACM, New York, NY, USA, pp 27–36CrossRefGoogle Scholar
  22. 22.
    Mandel M, Ellis D (2005) Song-level features and support vector machines for music classification. In: Proceedings of the 6th international conference on music information retrieval (ISMIR 2005), London, UKGoogle Scholar
  23. 23.
    Mandel M, Ellis DP (2007) Labrosa’s audio music similarity and classification submissions. In: Proceedings of the international symposium on music information retrieval, Vienna, Austria—Mirex 2007Google Scholar
  24. 24.
    Pampalk E (2006) Computational models of music similarity and their application in music information retrieval. Doctoral dissertation, Vienna University of Technology, AustriaGoogle Scholar
  25. 25.
    Pampalk E, Rauber A, Merkl D (2002) Content-based organization and visualization of music archives. In: Proceedings of the tenth ACM international conference on multimedia. ACM, New York, NY, USA, pp 570–579CrossRefGoogle Scholar
  26. 26.
    Penny W (2001) KL-divergences of Normal, Gamma, Dirichlet and Wishart densities. Wellcome Department of Cognitive Neurology, University College LondonGoogle Scholar
  27. 27.
    Pohle T, Schnitzer D (2007) Striving for an improved audio similarity measure. In: 4th annual music information retrieval evaluation exchangeGoogle Scholar
  28. 28.
    Rafailidis D, Nanopoulos A, Manolopoulos Y (2009) Nonlinear dimensionality reduction for efficient and effective audio similarity searching. Multimedia Tools and Applications, pp 1–15Google Scholar
  29. 29.
    Roy P, Aucouturier J, Pachet F, Beurive A (2005) Exploiting the tradeoff between precision and cpu-time to speed up nearest neighbor search. In: Proceedings of the 6th international conference on music information retrieval (ISMIR 2005), London, UKGoogle Scholar
  30. 30.
    Schnitzer D (2007) Mirage—high-performance music similarity computation and automatic playlist generation. Master’s thesis, Vienna University of TechnologyGoogle Scholar
  31. 31.
    Wang J, Wang X, Shasha D, Zhang K (2005) Metricmap: an embedding technique for processing distance-based queries in metric spaces. IEEE Trans Syst Man Cybern Part B Cybern 35(5):973–987CrossRefGoogle Scholar
  32. 32.
    Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA, pp 311–321Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Dominik Schnitzer
    • 1
    Email author
  • Arthur Flexer
    • 1
  • Gerhard Widmer
    • 2
  1. 1.Austrian Research Institute for Artificial Intelligence (OFAI)ViennaAustria
  2. 2.Department of Computational PerceptionJohannes Kepler UniversityLinzAustria

Personalised recommendations