Evolutionary Multi-objective Training Set Selection of Data Instances and Augmentations for Vocal Detection

  • Igor VatolkinEmail author
  • Daniel Stoller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11453)


The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a “smart” selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance.


Vocal detection Evolutionary multi-objective training set selection Data augmentation 



This work was funded by the DFG (German Research Foundation, project 336599081) and by EPSRC grant EP/L01632X/1.


  1. 1.
    Acampora, G., Herrera, F., Tortora, G., Vitiello, A.: A multi-objective evolutionary approach to training set selection for support vector machine. Knowl. Based Syst. 147, 94–108 (2018)CrossRefGoogle Scholar
  2. 2.
    Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York (1996)zbMATHGoogle Scholar
  3. 3.
    Beume, N., Naujoks, B., Emmerich, M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)CrossRefGoogle Scholar
  4. 4.
    Bittner, R.M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pp. 155–160 (2014)Google Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  6. 6.
    Cano, J.R., Herrera, F., Lozano, M.: Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data Knowl. Eng. 60(1), 90–108 (2007)CrossRefGoogle Scholar
  7. 7.
    Coello, C.A.C., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving Multi-Objective Problems. Springer, New York (2007). Scholar
  8. 8.
    Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: a dataset for music analysis. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pp. 316–323 (2017)Google Scholar
  9. 9.
    Fujinaga, I.: Machine recognition of timbre using steady-state tone of acoustic musical instruments. In: Proceedings of the International Computer Music Conference (ICMC), pp. 207–210 (1998)Google Scholar
  10. 10.
    Goto, M., Nishimura, T.: RWC music database: popular, classical, and jazz music databases. In: Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), pp. 287–288 (2002)Google Scholar
  11. 11.
    Kumar, A., Cowen, L.: Augmented training of hidden Markov models to recognize remote homologs via simulated evolution. Bioinformatics 25(13), 1602–1608 (2009)CrossRefGoogle Scholar
  12. 12.
    Lartillot, O., Toiviainen, P.: MIR in Matlab (II): a toolbox for musical feature extraction from audio. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 127–130 (2007)Google Scholar
  13. 13.
    Lemley, J., Bazrafkan, S., Corcoran, P.: Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5, 5858–5869 (2017)CrossRefGoogle Scholar
  14. 14.
    Livshin, A., Rodet, X.: The significance of the non-harmonic “noise” versus the harmonic series for musical instrument recognition. In: Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pp. 95–100 (2006)Google Scholar
  15. 15.
    Mäkinen, T., Kiranyaz, S., Pulkkinen, J., Gabbouj, M.: Evolutionary feature generation for content-based audio classification and retrieval. In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 1474–1478 (2012)Google Scholar
  16. 16.
    Mauch, M., Ewert, S.: The audio degradation toolbox and its application to robustness evaluation. In: Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), pp. 83–88 (2013)Google Scholar
  17. 17.
    Mauch, M., Fujihara, H., Yoshii, K., Goto, M.: Timbre and melody features for the recognition of vocal activity and instrumental solos in polyphonic music. In: Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pp. 233–238 (2011)Google Scholar
  18. 18.
    McFee, B., Humphrey, E.J., Bello, J.P.: A software framework for musical data augmentation. In: Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 248–254 (2015)Google Scholar
  19. 19.
    Mierswa, I., Morik, K.: Automatic feature extraction for classifying audio data. Mach. Learn. J. 58(2–3), 127–149 (2005)CrossRefGoogle Scholar
  20. 20.
    Miranda, E.R., Biles, J.A.: Evolutionary Computer Music. Springer, New York (2007). Scholar
  21. 21.
    Mun, S., Park, S., Han, D.K., Ko, H.: Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE 2017), November 2017Google Scholar
  22. 22.
    Pachet, F., Zils, A.: Evolving automatically high-level music descriptors from acoustic signals. In: Wiil, U.K. (ed.) CMMR 2003. LNCS, vol. 2771, pp. 42–53. Springer, Heidelberg (2004). Scholar
  23. 23.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)Google Scholar
  24. 24.
    Rao, V., Gupta, C., Rao, P.: Context-Aware features for singing voice detection in polyphonic music. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 43–57. Springer, Heidelberg (2013). Scholar
  25. 25.
    Regnier, L., Peeters, G.: Singing voice detection in music tracks using direct voice vibrato detection. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1685–1688. IEEE (2009)Google Scholar
  26. 26.
    Schlüter, J.: Learning to pinpoint singing voice from weakly labeled examples. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 44–50 (2016)Google Scholar
  27. 27.
    Schlüter, J., Grill, T.: Exploring data augmentation for improved singing voice detection with neural networks. In: Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pp. 121–126 (2015)Google Scholar
  28. 28.
    Stoller, D., Ewert, S., Dixon, S.: Jointly detecting and separating singing voice: a multi-task approach. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M.D., Ward, D. (eds.) LVA/ICA 2018. LNCS, vol. 10891, pp. 329–339. Springer, Cham (2018). Scholar
  29. 29.
    Vatolkin, I., Preuß, M., Rudolph, G.: Multi-objective feature selection in music genre and style recognition tasks. In: Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO), pp. 411–418. ACM Press (2011)Google Scholar
  30. 30.
    Vatolkin, I., Preuß, M., Rudolph, G.: Training set reduction based on 2-gram feature statistics for music genre recognition. Technical report TR13-2-001, Faculty of Computer Science, Technische Universität Dortmund (2013)Google Scholar
  31. 31.
    Vatolkin, I., Theimer, W., Botteck, M.: Partition based feature processing for improved music classification. In: Gaul, W.A., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds.) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 411–419. Springer, Heidelberg (2012).
  32. 32.
    Velasco, J.M., et al.: Data augmentation and evolutionary algorithms to improve the prediction of blood glucose levels in scarcity of training data. In: Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 2193–2200. IEEE (2017)Google Scholar
  33. 33.
    Zitzler, E.: Evolutionary multiobjective optimization. In: Rozenberg, G., Bäck, T., Kok, J.N. (eds.) Handbook of Natural Computing, vol. 2, pp. 871–904. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  34. 34.
    Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms — a comparative case study. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 292–301. Springer, Heidelberg (1998). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.TU DortmundDortmundGermany
  2. 2.Queen Mary University of LondonLondonUK

Personalised recommendations