Style Transfer of Abstract Drum Patterns Using a Light-Weight Hierarchical Autoencoder

Voschezang, Mark

doi:10.1007/978-3-030-31978-6_10

Mark Voschezang ORCID: orcid.org/0000-0001-5585-7907⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1021))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

742 Accesses

Abstract

Many improvements have been made in the field of generative modelling. State-of-the-art unsupervised models have been able to transfer the style of existing media with photo-realistic quality. However, these improvements have been largely limited to graphical data. Music has been proven to be more difficult to model. Magenta’s MusicVAE can quite successfully generate abstract rhythms and melodies. However, MusicVAE is a large model that requires vast amounts of computing power before it starts to make realistic predictions. Moreover, its input is heavily quantized which makes it impossible to model musical variations such as swing. This paper proposes a lightweight but high-resolution variational recurrent autoencoder that can be used to transfer the style of input samples while maintaining characteristics of the original sample. This model can be trained in a few hours on small datasets and allows researchers and musicians to experiment with musical style transfer. In addition, a novel technique based on normalized compression distance is used to evaluate the model by measuring the similarity of generated samples to target classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baldi, P.: Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49 (2012)
Google Scholar
Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. arXiv preprint arXiv:1803.09574 (2018)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Creswell, A., Bharath, A.A., Sengupta, B.: Conditional autoencoders with adversarial information factorization. arXiv preprint arXiv:1711.05175 (2017)
D’Errico, M.A.: Behind the beat: technical and practical aspects of instrumental hip-hop composition. Ph.D. thesis, Tufts University (2011)
Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016)
Fujii, S., Hirashima, M., Kudo, K., Ohtsuki, T., Nakamura, Y., Oda, S.: Synchronization error of drum kit playing with a metronome at different tempi by professional drummers. Music Percept.: Interdiscip. J. 28(5), 491–503 (2011)
Article Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)
MathSciNet MATH Google Scholar
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126
Chapter Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. arXiv preprint arXiv:1804.04732 (2018)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4743–4751. Curran Associates, Inc. (2016). http://papers.nips.cc/paper/6581-improved-variational-inference-with-inverse-autoregressive-flow.pdf
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Liao, J., Yao, Y., Yuan, L., Hua, G., Kang, S.B.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)
Lippens, S., Martens, J.P., De Mulder, T.: A comparison of human and automatic musical genre classification. In: 2004 Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 4, pp. iv-233–iv-236. IEEE (2004)
Google Scholar
Louboutin, C., Meredith, D.: Using general-purpose compression algorithms for music analysis. J. New Music Res. 45(1), 1–16 (2016)
Article Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of icml, vol. 30, p. 3 (2013)
Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Meredith, D.: COSIATEC and SIATECCompress: pattern discovery by geometric compression. In: International Society for Music Information Retrieval Conference. International Society for Music Information Retrieval (2013)
Google Scholar
Meredith, D.: Computational Music Analysis, vol. 62. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-25931-4
Book MATH Google Scholar
Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network. arXiv preprint arXiv:1805.07848 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014)
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. arXiv preprint arXiv:1803.05428 (2018)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California University San Diego La Jolla Institute for Cognitive Science (1985)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Wang, X., Yu, F., Dou, Z.Y., Gonzalez, J.E.: Skipnet: learning dynamic routing in convolutional networks. arXiv preprint arXiv:1711.09485 (2017)
Watson, J., Holmes, C., et al.: Approximate models and robust decisions. Stat. Sci. 31(4), 465–489 (2016)
Article MathSciNet Google Scholar
Witek, M.A., Carlsen, K.: Simultaneous rhythmic events with different schematic affiliations: microtiming and dynamic attending in two contemporary R&B grooves. In: Musical Rhythm in the Age of Digital Reproduction, pp. 51–68. Routledge (2016)
Google Scholar
Yunpeng, C., Xiaojie, J., Bingyi, K., Jiashi, F., Shuicheng, Y.: Sharing residual units through collective tensor factorization in deep neural networks. arXiv preprint arXiv:1703.02180 (2017)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks (2010)
Google Scholar

Download references

Acknowledgments

The author wishes to thank Stefan Schlobach, Albert Meroño Peñuela and Peter Bloem for inspiration and useful discussions.

Author information

Authors and Affiliations

VU University, 1081 HV, Amsterdam, The Netherlands
Mark Voschezang

Authors

Mark Voschezang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Voschezang .

Editor information

Editors and Affiliations

Tilburg University, Tilburg, The Netherlands
Martin Atzmueller
Eindhoven University of Technology, Eindhoven, The Netherlands
Wouter Duivesteijn

A Appendix

Both the implementation of the model described in this paper and a number of synthesized examples of generated MIDI files can be found at https://github.com/voschezang/drum-style-transfer.

1.1 A.1 Parameters

Table 1 shows the values of the most important parameters.

Table 1. Parameters

Full size table

1.2 A.2 Structure of the Model

The encoder and decoders can be seen as a pipeline where a sequence of transformations is applied to an input. Table 2 shows a brief overview of each layer.

Table 2. Structure of the model

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Voschezang, M. (2019). Style Transfer of Abstract Drum Patterns Using a Light-Weight Hierarchical Autoencoder. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-31978-6_10
Published: 25 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31977-9
Online ISBN: 978-3-030-31978-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Style Transfer of Abstract Drum Patterns Using a Light-Weight Hierarchical Autoencoder

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Parameters

1.2 A.2 Structure of the Model

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation