Separation of Drum and Bass from Monaural Tracks

  • Michele ScarpinitiEmail author
  • Simone Scardapane
  • Danilo Comminiello
  • Raffaele Parisi
  • Aurelio Uncini
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 102)


In this paper, we propose a deep recurrent neural network (DRNN), based on the Long Short-Term Memory (LSTM) unit, for the separation of drum and bass sources from a monaural audio track. In particular, a single DRNN with a total of six hidden layers (three feedforward and three recurrent) is used for each original source to be separated. In this work, we limit our attention to the case of only two, challenging sources: drum and bass. Some experimental results show the effectiveness of the proposed approach with respect to another state-of-the-art method. Results are expressed in terms of well-known metrics in the field of source separation.


Deep recurrent neural networks Long short-term memory Monaural audio source separation Non-negative matrix factorization 


  1. 1.
    Asari, H., Olsson, R.K., Pearlmutter, B.A.: Sparsification for monaural source separation. In: Makino, S., Lee, T.W., Sawada, H. (eds.) Blind Speech Separation, Chap. 14, pp. 387–410. Springer (2007)CrossRefGoogle Scholar
  2. 2.
    Beierholm, T., Dam Pedersen, B., Winthert, O.: Low complexity bayesian single-channel source separation. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) (2004)Google Scholar
  3. 3.
    Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: 15th International Society for Music Information Retrieval Conference, pp. 1–6. Taipei, Taiwan (2014)Google Scholar
  4. 4.
    Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley (2002)Google Scholar
  5. 5.
    Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation. Springer (2010)Google Scholar
  6. 6.
    Gao, B., Woo, W.L., Dlay, S.S.: Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Trans. Audio Speech Lang. Process. 19(4), 961–976 (2011)CrossRefGoogle Scholar
  7. 7.
    Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP 2014), pp. 1–5. Florence, Italy, 4–9 May 2014Google Scholar
  8. 8.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  9. 9.
    Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(12), 1–12 (2015)CrossRefGoogle Scholar
  10. 10.
    Jang, G.J., Lee, T.W.: A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4(12), 1365–1392 (2003)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  12. 12.
    Litvin, Y., Cohen, I.: Source separation using Bark-scale wavelet packet decompostion. In: Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP2009), pp. 1–4. Grenoble, France, 1–4 Sept 2009Google Scholar
  13. 13.
    Molla, K., Hirose, K.: Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2004)CrossRefGoogle Scholar
  14. 14.
    Patki, K.: Review of single channel source separation techniques. In: Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pp. 1–5. Curitiba, Brasil, 4–8 Nov 2013Google Scholar
  15. 15.
    Reddy, A.M., Raj, B.: Soft mask methods for single-channel speaker separation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1766–1776 (2007)CrossRefGoogle Scholar
  16. 16.
    Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, 19–22 Oct 2003Google Scholar
  17. 17.
    Tieleman, T., Hinton, G.: Lecture 6.5—RMSProp. Tech. rep., COURSERA: Neural Networks for Machine Learning (2012)Google Scholar
  18. 18.
    Uncini, A.: Fundamentals of adaptive signal processing. In: Signals and Communication Technology. Springer International Publishing, Switzerland (2015)Google Scholar
  19. 19.
    Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRefGoogle Scholar
  20. 20.
    Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRefGoogle Scholar
  21. 21.
    Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), pp. 3709–3713. Florence, Italy, 4–9 May 2014Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Michele Scarpiniti
    • 1
    Email author
  • Simone Scardapane
    • 1
  • Danilo Comminiello
    • 1
  • Raffaele Parisi
    • 1
  • Aurelio Uncini
    • 1
  1. 1.Department of Information Engineering, Electronics and Telecommunications (DIET)“Sapienza” University of RomeRomeItaly

Personalised recommendations