Abstract
Supervised learning models have been applied to create good onset detection systems for musical audio signals. However, this always requires a large set of labeled training examples, and hand-labeling is quite tedious and time consuming. In this paper, we present a bootstrap learning approach to train an accurate note onset detection model. Audio alignment techniques are first used to find the correspondence between a symbolic music representation (such as MIDI data) and an acoustic recording. This alignment provides an initial estimate of note boundaries which can be used to train an onset detector. Once trained, the detector can be used to refine the initial set of note boundaries and training can be repeated. This iterative training process eliminates the need for hand-labeled audio. Tests show that this training method can improve an onset detector initially trained on synthetic data.
Article PDF
Similar content being viewed by others
References
Beauchamp, J. (1993). Unix workstation software for analysis, graphics, modifications, and synthesis of musical sounds. AES Convention, preprint 3479. New York: Audio Engineering Society.
Dannenberg, R. B. & Hu, N. (2003). Polyphonic audio matching for score following and intelligent audio editors. In Proceedings of the 2003 International Computer Music Conference (pp. 27–34). San Francisco: International computer music association.
Downie, J. S., West, K., Ehmann, A., & Vincent, E. (2005). The 2005 Music Information Retrieval Evaluation Exchange (MIREX 2005): Preliminary Overview (pp. 320–323). ISMIR 2005: 6th International Conference on Music Information Retrieval Proceedings (pp. 288–295). London: Queen Mary, University of London.
Fitzgerald, R. B. (1955). English suite: For Bb trumpet or cornet and piano. Bryn Mawr: Theodore Presser Co.
Hu, N. & Dannenberg, R. B. (2002). A comparison of melodic database retrieval techniques using sung queries. In JCDL 2002: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 301–307). New York: ACM.
Hu, N., Dannenberg, R. B., & Tzanetakis, G. (2003). Polyphonic audio matching and alignment for music retrieval. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 185–188). New York: IEEE.
Kapanci, E. and Pfeffer, A. (2004). A hierarchical approach to onset detection. In Proceedings of the 2004 International Computer Music Conference (pp. 438–441). San Francisco: International Computer Music Association.
Kuipers, B. & Beeson, P. (2002). Bootstrap learning for place recognition. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (pp. 174-180). Menlo Park, CA: AAAI Press.
Kurková, V. (1992). Kolmogorov’s theorem and multilayer neural networks. Neural Networks, 5(3), 501–506.
Lu, L., Li, S. Z., & Zhang, H. J. (2001). Content-based audio segmentation using support vector machines. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2001) (pp. 956–959). New York: IEEE.
Marolt, M., Kavcic, A., & Privosnik, M. (2002). Neural networks for note onset detection in piano music. http://lgm.fri.uni-lj.si/ matic/SONIC.html.
McAulay, R. J. & Quatieri, T. F. (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 744–754.
Muller, M., Kurth, F., & Clausen, M. (2005). Audio matching via chroma-based statistical features. ISMIR 2005: 6th International Conference on Music Information Retrieval Proceedings (pp. 288–295). London: Queen Mary, University of London.
Orio, N. & Schwarz, D. (2001). Alignment of monophonic and polyphonic music to a score. In Proceedings of the 2001 International Computer Music Conference (pp. 155–158). San Francisco: International Computer Music Association.
Plumbley, Mark D., Brossier, P. M., & Bello, J. P. (2004). Fast labelling of notes in music signals. ISMIR 2004 Fifth International Conference on Music Information Retrieval Proceedings (pp. 331–336). Barcelona, Spain: Universitat Pompeu Fabra.
Raphael, C. (1999). Automatic segmentation of acoustic musical signals using hidden markov model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(4), 360–370.
Raphael, C. (2004). A hybrid graphical model for aligning polyphonic audio with musical scores. In Proceedings of the 5th International Conference on Musical Information Retrieval (pp. 387–394). London: Queen Mary, University of London.
Schwarz, D. (2004). Data-driven concatenative sound synthesis (PhD thesis) Paris, France: Universit Paris 6-Pierre et Marie Curie.
Soulez, F., Rodet, X., & Schwarz, D. (2003). Improving polyphonic and poly-instrumental music to score alignment. ISMIR 2003 In Proceedings of the Fourth International Conference on Music Information Retrieval (pp. 143–148). Baltimore, MD: Johns Hopkins University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Major part of work was done while the first author was at Carnegie Mellon University.
Editor: Gerhard Widmer
Rights and permissions
About this article
Cite this article
Hu, N., Dannenberg, R.B. Bootstrap learning for accurate onset detection. Mach Learn 65, 457–471 (2006). https://doi.org/10.1007/s10994-006-8458-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8458-5