Advertisement

Comparing Models for Harmony Prediction in an Interactive Audio Looper

  • Benedikte WallaceEmail author
  • Charles P. Martin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11453)

Abstract

Musicians often use tools such as loop-pedals and multitrack recorders to assist in improvisation and songwriting, but these tools generally don’t proactively contribute aspects of the musical performance. In this work, we introduce an interactive audio looper that predicts a loop’s harmony, and constructs an accompaniment automatically using concatenative synthesis. The system uses a machine learning (ML) model for harmony prediction, that is, it generates a sequence of chords symbols for a given melody. We analyse the performance of two potential ML models for this task: a hidden Markov model (HMM) and a recurrent neural network (RNN) with bidirectional long short-term memory (BLSTM) cells. Our findings show that the RNN approach provides more accurate predictions and is more robust with respect to changes in the training data. We consider the impact of each model’s predictions in live performance and ask: “What is an accurate chord prediction anyway?”

Keywords

RNN Deep learning Music interaction Machine improvisation 

Notes

Acknowledgment

This work was supported by The Research Council of Norway as a part of the Engineering Predictability with Embodied Cognition (EPEC) project, under grant agreement 240862 and through its Centres of Excellence scheme, project number 262762.

References

  1. 1.
    Bahl, L.R., Jelinek, F., Mercer, R.L.: A maximum likelihood approach to continuous speech recognition. In: Readings in Speech Recognition, pp. 308–319. Elsevier (1990)Google Scholar
  2. 2.
    Brunner, G., Wang, Y., Wattenhofer, R., Wiesendanger, J.: Jambot: music theory aware chord based generation of polyphonic music with LSTMs. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 519–526. IEEE (2017).  https://doi.org/10.1109/ICTAI.2017.00085
  3. 3.
    Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., Efros, A.A.: Large-scale study of curiosity-driven learning. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019). https://arxiv.org/abs/1808.04355
  4. 4.
    Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960).  https://doi.org/10.1177/001316446002000104CrossRefGoogle Scholar
  5. 5.
    Cuthbert, M.S., Ariza, C.: music21: a toolkit for computer-aided musicology and symbolic music data. In: Downie, J.S., Veltkamp, R.C. (eds.) Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), pp. 637–642. International Society for Music Information Retrieval, Utrecht (2010)Google Scholar
  6. 6.
    Eck, D., Schmidhuber, J.: Finding temporal structure in music: blues improvisation with LSTM recurrent networks. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 747–756. IEEE (2002).  https://doi.org/10.1109/NNSP.2002.1030094
  7. 7.
    Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, vol. 99, pp. 97–105 (1999)Google Scholar
  8. 8.
    Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 5(18), 602–610 (2005)CrossRefGoogle Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  12. 12.
    Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)CrossRefGoogle Scholar
  13. 13.
    Lim, H., Rhyu, S., Lee, K.: Chord generation from symbolic melody using BLSTM networks. In: 18th International Society for Music Information Retrieval Conference (2017)Google Scholar
  14. 14.
    Martin, C.P., Ellefsen, K.O., Torresen, J.: Deep predictive models in interactive music. arXiv e-prints, January 2018. https://arxiv.org/abs/1801.10492
  15. 15.
    Martin, C.P., Torresen, J.: RoboJam: a musical mixture density network for collaborative touchscreen interaction. In: Liapis, A., Romero Cardalda, J.J., Ekárt, A. (eds.) EvoMUSART 2018. LNCS, vol. 10783, pp. 161–176. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-77583-8_11CrossRefGoogle Scholar
  16. 16.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  17. 17.
    Pachet, F., Roy, P., Moreira, J., d’Inverno, M.: Reflexive loopers for solo musical improvisation. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013, pp. 2205–2208. ACM, New York (2013).  https://doi.org/10.1145/2470654.2481303
  18. 18.
    Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986).  https://doi.org/10.1109/MASSP.1986.1165342CrossRefGoogle Scholar
  19. 19.
    Raczyński, S.A., Fukayama, S., Vincent, E.: Melody harmonization with interpolated probabilistic models. J. New Music Res. 42(3), 223–235 (2013)CrossRefGoogle Scholar
  20. 20.
    Schmidhuber, J.: Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Sci. 18(2), 173–187 (2006)CrossRefGoogle Scholar
  21. 21.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  22. 22.
    Simon, I., Morris, D., Basu, S.: Mysong: automatic accompaniment generation for vocal melodies. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2008, pp. 725–734. ACM, New York (2008).  https://doi.org/10.1145/1357054.1357169
  23. 23.
    Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: IEEE Speech Synthesis Workshop, pp. 227–230 (2002)Google Scholar
  24. 24.
    Wallace, B.: Predictive songwriting with concatenative accompaniment. Master’s thesis, Department of Informatics, University of Oslo (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion, Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations