The emergence of deep learning: new opportunities for music and audio technologies
- 261 Downloads
There has been tremendous interest in deep learning across many fields of study. Recently, these techniques have gained popularity in the field of music. Projects such as Magenta (Google’s Brain Team’s music generation project), Jukedeck, and IBM Watson Beat testify to their potential. Due to this rising interest in using deep neural networks to tackle tasks in the domain of audio and music, the guest editors organized the first International Workshop on Music and Audio as part of the International Joint Conference on Neural Networks (IJCNN) in Anchorage, Alaska in 2017. The current NCAA issue on “Deep learning for music and audio” was born out of the workshop.
deep learning for computational music research;
modeling hierarchical and long-term music structures using deep learning;
modeling ambiguity and preference in music;
applications of deep networks for music and audio such as audio transcription, voice separation, music generation, music recommendation, etc.;
novel architectures designed to represent music and audio.
In addition to applications, a number of papers in this special issue also examine meaningful concepts that deep networks can learn from music and audio, as well as compare the performance of different architectures on feature learning, and investigate the impact of challenging scenarios in acoustic signals. Chuan et al. show that musical concepts such as key and chords can be captured by statistical learning methods such as word2vec, a commonly used technique in the field of natural language processing. Convolutional neural networks for audio emotion recognition are explored by Wieser et al. who found that these networks can learn meaningful features related to certain emotions. Deng et al. propose a novel deep time–frequency LSTM for audio restoration, whereby temporal and spectral dynamics are explicitly captured, thus allowing for more effective low bitrate audio restoration. Dörfler et al. show that the design of the audio filter and the time–frequency resolution can affect the accuracy of convolutional neural networks when used as a classifier. Kiskin et al. focus on the detection of low signal-to-noise ratio acoustic events (e.g., detecting the presence of mosquitoes in audio recordings) through convolutional neural networks and other machine learning techniques, using acoustic features extracted by different transforms. Finally, the effect of different deep architectures and multiple learning sources on a model’s ability to learn efficient musical representations is examined by Kim et al.
We hope the readers will enjoy the manuscripts in this special issue. Our thanks goes out to all of the authors, reviewers, editor-in-chief, and the editorial office of NCAA for their support. Exciting times are ahead for the field of audio and music technologies.