SATIN: a persistent musical database for music information retrieval and a supporting deep learning experiment on song instrumental classification

Abstract

This paper introduces SATIN, the Set of Audio Tags and Identifiers Normalized. SATIN is a database of 400k audio-related metadata and identifiers that aims at facilitating reproducibility and comparisons among the Music Information Retrieval (MIR) algorithms. The idea is to take advantage of partnerships between scientists and private companies that host millions of tracks. Scientists can send their feature extraction algorithm to companies along SATIN identifiers and retrieve the corresponding features. This procedure allows the MIR community to have access to more tracks for classification purposes. Afterwards, scientists can provide to the MIR community the classification result for each track, which can then be compared with other algorithms results. SATIN thus resolves the major problems of accessing more tracks, managing copyrights locks, saving computation time, and guaranteeing consistency over research databases. We introduce SOFT1, the first Set Of FeaTures extracted by a company thanks to SATIN. We propose a supporting experiment classifying instrumentals and songs to detail a possible use of SATIN. We compare a deep learning approach —that has emerged in recent years in MIR— with a knowledge-based approach.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    https://www.deezer.com/features

  2. 2.

    http://acousticbrainz.org/

  3. 3.

    http://www.audiocontentanalysis.org/data-sets/

  4. 4.

    https://acousticbrainz.org/datasets/accuracy#voice_instrumental

  5. 5.

    http://www.ccmixter.org/

  6. 6.

    http://www.mathieuramona.com/wp/data/jamendo

  7. 7.

    https://www.jamendo.com

  8. 8.

    http://medleydb.weebly.com

  9. 9.

    http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/

  10. 10.

    http://www.deezer.com

  11. 11.

    http://www.simbals.com

  12. 12.

    https://www.musixmatch.com

  13. 13.

    http://isrc.ifpi.org/en

  14. 14.

    http://www.ifpi.org

  15. 15.

    https://musicbrainz.org/

  16. 16.

    https://soundcloud.com/

  17. 17.

    https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit/blob/master/fma.py#L55

  18. 18.

    https://github.com/ybayle/SATIN

  19. 19.

    https://developers.deezer.com/api

  20. 20.

    https://developer.spotify.com/web-api/

  21. 21.

    https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2

  22. 22.

    https://developer.musixmatch.com

  23. 23.

    https://developer.musixmatch.com/documentation/music-meta-data

  24. 24.

    https://github.com/ybayle/awesome-deep-learning-music

References

  1. 1.

    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX symposium on operating system design implementation, vol 16, pp 265– 283

  2. 2.

    Bayle Y, Hanna P, Robine M (2016) Classification à grande échelle de morceaux de musique en fonction de la présence de chant. In: Journées d’informatique musicale, Albi, France, pp 144–152

  3. 3.

    Bekios-Calfa J, Buenaposada J M, Baumela L (2011) Revisiting linear discriminant techniques in gender recognition. IEEE Trans Pattern Anal Mach Intell 33(4):858–864

    Article  Google Scholar 

  4. 4.

    Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  5. 5.

    Bertin-Mahieux T, Ellis D P W, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international society for music information retrieval conference, Miami, FL, USA, pp 591–596

  6. 6.

    Bittner R M, Salamon J, Tierney M, Mauch M, Cannam C, Bello J P (2014) MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th international society for music information retrieval conference, Taipei, Taiwan, pp 155–160

  7. 7.

    Bogdanov D, Serrà J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimedia 13(4):687–701

    Article  Google Scholar 

  8. 8.

    Bogdanov D, Wack N, Gómez E, Gulati S, Herrera P, Mayor O, Roma G, Salomon J, Zapata J R, Serra X (2013) Essentia: an audio analysis library for music information retrieval. In: Proceedings of the 14th international society for music information retrieval conference, Curitiba, Brazil, pp 493– 498

  9. 9.

    Cheng Z, Shen J (2014) Just-for-me: an adaptive personalization system for location-aware social music recommendation. In: Proceedings of international conference on multimedia retrieval. ACM, p 185

  10. 10.

    Choi K, Fazekas G, Sandler M, Kim J (2015) Auralisation of deep convolutional neural networks: Listening to learned features. In: Proceedings of the 16th international society for music information retrieval conference, pp 26–30

  11. 11.

    Choi K, Fazekas G, Sandler M B (2016) Automatic tagging using deep convolutional neural networks. In: Proceedings of the 17th international society for music information retrieval conference, New York, NY, USA, pp 805–811

  12. 12.

    Choi K, Fazekas G, Cho K, Sandler M (2017) A comparison on audio signal preprocessing methods for deep neural networks on music tagging. arXiv:1709.01922

  13. 13.

    Chollet F (2015) Keras: deep learning library for theano and tensorflow. Tech. Rep

  14. 14.

    Cover T, Hart P E (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27

    Article  Google Scholar 

  15. 15.

    Defferrard M, Benzi K, Vandergheynst P, Bresson X (2017) Fma: a dataset for music analysis. In: Proceedings of the 18th international society for music information retrieval conference

  16. 16.

    Eronen A, Klapuri A (2000) Musical instrument recognition using cepstral coefficients and temporal features. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2. IEEE, pp II753–II756

  17. 17.

    Fernández C, Huerta I, Prati A (2015) A comparative evaluation of regression learning algorithms for facial age estimation. In: Ji Q, Moeslund T, Hua G, Nasrollahi K (eds) Face and facial expression recognition from real world videos. Springer, Cham, pp 133–144

  18. 18.

    Foote J T (1997) Content-based retrieval of music and audio. In: Multimedia storage and archiving systems II, international society for optics and photonics, vol 3229, pp 138–148

  19. 19.

    Ghosal A, Chakraborty R, Dhara B C, Saha S K (2013) A hierarchical approach for speech-instrumental-song classification. SpringerPlus 2(526):1–11

    Google Scholar 

  20. 20.

    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, pp 315–323

  21. 21.

    Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. In: Proceedings of the 3rd international conference on music information retrieval, Paris, France, pp 287–288

  22. 22.

    Gouyon F, Sturm B L, Oliveira J L, Hespanhol N, Langlois T (2014) On evaluation validity in music autotagging. arXiv:1410.0001

  23. 23.

    Hennequin R, Moussallam M (2015) Detection and characterization of singing voice using deep neural networks. Tech. rep., Deezer

  24. 24.

    Hershey S, Chaudhuri S, Ellis D P W, Gemmeke J F, Jansen A, Moore R C, Plakal M, Platt D, Saurous R A, Seybold B, Slaney M, Weiss R J, Wilson K (2017) Cnn architectures for large-scale audio classification. In: ICASSP. IEEE, pp 131–135

  25. 25.

    Hespanhol N (2013) Using autotagging for classification of vocals in music signals. PhD Thesis, University of Porto, Portugal

  26. 26.

    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  27. 27.

    Jeon B, Kim C, Kim A, Kim D, Park J, Ha J W (2017) Music emotion recognition via end-to-end multimodal neural networks. In: RECSYS

  28. 28.

    Kim Y E, Whitman B (2002) Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd international conference on music information retrieval, Paris, France, pp 17–23

  29. 29.

    Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C J C, Bottou L, Weinberger K Q (eds) Proceedings of the 25th conference on advances neural information processing systems. Curran Associates, Inc., pp 1097–1105

  30. 30.

    Law E, West K, Mandel M I, Bay M, Downie J S (2009) Evaluation of algorithms using games: the case of music tagging. In: Proceedings of the 10th international society for music information retrieval conference, Kobe, Japan, pp 387–392

  31. 31.

    Leglaive S, Hennequin R, Badeau R (2015) Singing voice detection with deep recurrent neural networks. In: Proceedings of the 40th IEEE international conference on acoustics, speech, and signal processing, Brisbane, Australia, pp 121–125

  32. 32.

    Lehner B, Widmer G (2015) Monaural blind source separation in the context of vocal detection. In: Proceedings of the 16th international society for music information retrieval conference, pp 309–315

  33. 33.

    Lehner B, Widmer G, Böck S (2015) A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In: Proceedings of the 23rd european signal processing conference, Nice, France, pp 21–25

  34. 34.

    Lerch A (2012) An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, New York

    Google Scholar 

  35. 35.

    Liutkus A, Fitzgerald D, Rafii Z, Pardo B, Daudet L (2014) Kernel additive models for source separation. IEEE Trans Signal Process 62(16):4298–4310

    MathSciNet  Article  Google Scholar 

  36. 36.

    Livshin A, Rodet X (2003) The importance of cross database evaluation in sound classification. In: Proceedings of the 4th international conference on music information retrieval, Baltimore, MD, USA, pp 1–2

  37. 37.

    Llamedo M, Khawaja A, Martinez J P (2012) Cross-database evaluation of a multilead heartbeat classifier. IEEE Trans Inf Technol Biomed 16(4):658–664

    Article  Google Scholar 

  38. 38.

    Lyu Q, Wu Z, Zhu J (2015) Polyphonic music modelling with lstm-rtrbm. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 991–994

  39. 39.

    Marques G, Domingues M A, Langlois T, Gouyon F (2011) Three current issues in music autotagging. In: Proceedings of the 12th international society for music information retrieval conference, Miami, FL, USA, pp 795–800

  40. 40.

    Mathieu B, Essid S, Fillon T, Prado J, Richard G (2010) YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th international society for music information retrieval conference, Utrecht, Netherlands, pp 441–446

  41. 41.

    McEnnis D, McKay C, Fujinaga I (2006) Overview of OMEN. In: Proceedings of the 7th international conference on music information retrieval, Victoria, BC, Canada, pp 7–12

  42. 42.

    McFee B, Raffel C, Liang D, Ellis D P W, McVicar M, Battenberg E, Nieto O (2015) Librosa Audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18–25

  43. 43.

    Moore B C J (2012) An introduction to the psychology of hearing. Brill, Leiden

    Google Scholar 

  44. 44.

    Ng A Y (1997) Preventing “overfitting” of cross-validation data. In: Proceedings of the 14th international conference on machine learning, Nashville, TN, USA, pp 245–253

  45. 45.

    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learning Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  46. 46.

    Rabiner L R, Juang B H (1993) Fundamentals of speech recognition. PTR Prentice Hall, Englewood Cliffs

    Google Scholar 

  47. 47.

    Raina R, Madhavan A, Ng A Y (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873–880

  48. 48.

    Ramona M, Richard G, David B (2008) Vocal detection in music with support vector machines. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Las Vegas, NV, USA, pp 1885–1888

  49. 49.

    Rocamora M, Herrera P (2007) Comparing audio descriptors for singing voice detection in music audio files. In: Proceedings of the 11th Brazilian symposium on computer music, San Pablo, Brazil, vol 26, p 27

  50. 50.

    Roma G, Grais E M, Simpson A J, Plumbley M D (2016) Singing voice separation using deep neural networks and f0 estimation. In: MIREX

  51. 51.

    Schlüter J (2016) Learning to pinpoint singing voice from weakly labeled examples. In: Proceedings of the 17th international society for music information retrieval conference, New York, NY, USA, pp 44–50

  52. 52.

    Schlüter J, Grill T (2015) Exploring data augmentation for improved singing voice detection with neural networks. In: Proceedings of the 16th international society for music information retrieval conference, Málaga, Spain, pp 121–126

  53. 53.

    Shen J, Meng W, Yan S, Pang H, Hua X (2010) Effective music tagging through advanced statistical modeling. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 635–642

  54. 54.

    Shen J, Pang H, Wang M, Yan S (2012) Modeling concept dynamics for large scale music search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 455–464

  55. 55.

    Silla C N Jr, Koerich A L, Kaestner C A A (2008) The latin music database. In: Proceedings of the 9th international conference on music information retrieval, pp 451–456

  56. 56.

    Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  57. 57.

    Sturm B L (2014) The state of the art ten years after a state of the art: Future research in music information retrieval. Journal of New Music Research 43(2):147–172

    Article  Google Scholar 

  58. 58.

    Sturm B L (2015) Faults in the latin music database and with its use. In: Proceedings of the late breaking demo 16th international society for music information retrieval conference, Málaga, Spain, pp 1–2

  59. 59.

    Tachibana H, Ono T, Ono N, Sagayama S (2010) Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing. IEEE, pp 425–428

  60. 60.

    Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16 (2):467–476

    Article  Google Scholar 

  61. 61.

    Tzanetakis G, Cook P (2000) Marsyas: a framework for audio analysis. Organised Sound 4(3):169–175

    Article  Google Scholar 

  62. 62.

    Valin JM (2017) A hybrid dsp/deep learning approach to real-time full-band speech enhancement. Tech. rep

  63. 63.

    Velarde G (2017) Convolutional methods for music analysis. PhD Thesis, Aalborg Universitetsforlag

  64. 64.

    Wang X, Wang Y (2014) Improving content-based and hybrid music recommendation using deep learning. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 627–636

  65. 65.

    West K, Cox S (2004) Features and classifiers for the automatic classification of musical audio signals. In: Proceedings of the 5th international conference on music information retrieval

  66. 66.

    Yoshii K, Goto M, Komatani K, Ogata T, Okuno H G (2007) Improving efficiency and scalability of model-based music recommender system based on incremental training. In: Proceedings of the 8th international conference on music information retrieval, Vienna, Austria, pp 89–94

  67. 67.

    Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D et al (2002) The HTK book, vol 3. Cambridge University Engineering Department, Cambridge

    Google Scholar 

  68. 68.

    Zhao Z, Wang X, Xiang Q, Sarroff A M, Li Z, Wang Y (2010) Large-scale music tag recommendation with explicit multiple attributes. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 401–410

Download references

Acknowledgements

The authors thank Musixmatch for their metadata and the Research and Development team of Deezer for extracting the audio features. The authors thank Florian Iragne from Simbals, for his help with ISRC and musical metadata handling. The authors thank Fidji Berio and Kimberly Malcolm for insightful proofreading.

This work has been partially funded by the Charles University, project GA UK No. 1580317, project SVV 260451, by the internal grant agency of VŠB - Technical University of Ostrava, under the project no. SP2017/177 “Optimization of machine learning algorithms for the HPC platform”, by The Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science - LQ1602” and from the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center – LM2015070”. All findings and points of view expressed in this paper are those of the authors and do not necessarily reflect the views of their academic and industrial partners.

Part of the computer time for this study was provided by the computing facilities MCIA (Mésocentre de Calcul Intensif Aquitain) of the Université de Bordeaux and of the Université de Pau et des Pays de l’Adour.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yann Bayle.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bayle, Y., Robine, M. & Hanna, P. SATIN: a persistent musical database for music information retrieval and a supporting deep learning experiment on song instrumental classification. Multimed Tools Appl 78, 2703–2718 (2019). https://doi.org/10.1007/s11042-018-5797-8

Download citation

Keywords

  • Acoustic signal processing
  • Classification of instrumentals and songs
  • Content-based audio retrieval
  • Database
  • Machine learning algorithms
  • Music information retrieval
  • Music recommendation
  • Playlist generation
  • Reproducibility
  • Signal analysis
  • Signal processing algorithms
  • Music autotagging