Skip to main content

Style Transfer of Abstract Drum Patterns Using a Light-Weight Hierarchical Autoencoder

  • Conference paper
  • First Online:
Artificial Intelligence (BNAIC 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1021))

Included in the following conference series:

  • 742 Accesses

Abstract

Many improvements have been made in the field of generative modelling. State-of-the-art unsupervised models have been able to transfer the style of existing media with photo-realistic quality. However, these improvements have been largely limited to graphical data. Music has been proven to be more difficult to model. Magenta’s MusicVAE can quite successfully generate abstract rhythms and melodies. However, MusicVAE is a large model that requires vast amounts of computing power before it starts to make realistic predictions. Moreover, its input is heavily quantized which makes it impossible to model musical variations such as swing. This paper proposes a lightweight but high-resolution variational recurrent autoencoder that can be used to transfer the style of input samples while maintaining characteristics of the original sample. This model can be trained in a few hours on small datasets and allows researchers and musicians to experiment with musical style transfer. In addition, a novel technique based on normalized compression distance is used to evaluate the model by measuring the similarity of generated samples to target classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49 (2012)

    Google Scholar 

  2. Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. arXiv preprint arXiv:1803.09574 (2018)

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  5. Creswell, A., Bharath, A.A., Sengupta, B.: Conditional autoencoders with adversarial information factorization. arXiv preprint arXiv:1711.05175 (2017)

  6. D’Errico, M.A.: Behind the beat: technical and practical aspects of instrumental hip-hop composition. Ph.D. thesis, Tufts University (2011)

    Google Scholar 

  7. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016)

  8. Fujii, S., Hirashima, M., Kudo, K., Ohtsuki, T., Nakamura, Y., Oda, S.: Synchronization error of drum kit playing with a metronome at different tempi by professional drummers. Music Percept.: Interdiscip. J. 28(5), 491–503 (2011)

    Article  Google Scholar 

  9. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)

    MathSciNet  MATH  Google Scholar 

  10. Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126

    Chapter  Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. arXiv preprint arXiv:1804.04732 (2018)

  13. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  14. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

  15. Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4743–4751. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf

  16. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  17. Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)

  18. Lippens, S., Martens, J.P., De Mulder, T.: A comparison of human and automatic musical genre classification. In: 2004 Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 4, pp. iv-233–iv-236. IEEE (2004)

    Google Scholar 

  19. Louboutin, C., Meredith, D.: Using general-purpose compression algorithms for music analysis. J. New Music Res. 45(1), 1–16 (2016)

    Article  Google Scholar 

  20. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of icml, vol. 30, p. 3 (2013)

    Google Scholar 

  21. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 72(4), 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  22. Meredith, D.: COSIATEC and SIATECCompress: pattern discovery by geometric compression. In: International Society for Music Information Retrieval Conference. International Society for Music Information Retrieval (2013)

    Google Scholar 

  23. Meredith, D.: Computational Music Analysis, vol. 62. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-25931-4

    Book  MATH  Google Scholar 

  24. Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network. arXiv preprint arXiv:1805.07848 (2018)

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

  26. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014)

  27. Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. arXiv preprint arXiv:1803.05428 (2018)

  28. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California University San Diego La Jolla Institute for Cognitive Science (1985)

    Google Scholar 

  29. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  30. Wang, X., Yu, F., Dou, Z.Y., Gonzalez, J.E.: Skipnet: learning dynamic routing in convolutional networks. arXiv preprint arXiv:1711.09485 (2017)

  31. Watson, J., Holmes, C., et al.: Approximate models and robust decisions. Stat. Sci. 31(4), 465–489 (2016)

    Article  MathSciNet  Google Scholar 

  32. Witek, M.A., Carlsen, K.: Simultaneous rhythmic events with different schematic affiliations: microtiming and dynamic attending in two contemporary R&B grooves. In: Musical Rhythm in the Age of Digital Reproduction, pp. 51–68. Routledge (2016)

    Google Scholar 

  33. Yunpeng, C., Xiaojie, J., Bingyi, K., Jiashi, F., Shuicheng, Y.: Sharing residual units through collective tensor factorization in deep neural networks. arXiv preprint arXiv:1703.02180 (2017)

  34. Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks (2010)

    Google Scholar 

Download references

Acknowledgments

The author wishes to thank Stefan Schlobach, Albert Meroño Peñuela and Peter Bloem for inspiration and useful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Voschezang .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Both the implementation of the model described in this paper and a number of synthesized examples of generated MIDI files can be found at https://github.com/voschezang/drum-style-transfer.

1.1 A.1 Parameters

Table 1 shows the values of the most important parameters.

Table 1. Parameters

1.2 A.2 Structure of the Model

The encoder and decoders can be seen as a pipeline where a sequence of transformations is applied to an input. Table 2 shows a brief overview of each layer.

Table 2. Structure of the model

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Voschezang, M. (2019). Style Transfer of Abstract Drum Patterns Using a Light-Weight Hierarchical Autoencoder. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31978-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31977-9

  • Online ISBN: 978-3-030-31978-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics