Abstract
We present a new method for establishing an alignment between a polyphonic musical score and a corresponding sampled audio performance. The method uses a graphical model containing both latent discrete variables, corresponding to score position, as well as a latent continuous tempo process. We use a simple data model based only on the pitch content of the audio signal. The data interpretation is defined to be the most likely configuration of the hidden variables, given the data, and we develop computational methodology to identify or approximate this configuration using a variant of dynamic programming involving parametrically represented continuous variables. Experiments are presented on a 55-minute hand-marked orchestral test set.
Article PDF
Similar content being viewed by others
References
Baird, B., Blevins, D., & Zahler, N. (1993). Artificial intelligence and music: Implementing an interactive computer performer. Computer Music Journal, 17(2).
Cemgil, A. T., & Kappen, B. (2003). Monte Carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence, 18.
Cemgil, A. T. (2004). Bayesian music transcription. PhD. Thesis University of Nijmegen.
Cont A., Schwarz, D., & Schnell, N. (2004). Training IRCAM’s score follower. AAAI Fall Symposium Series on Style and Meaning in Language, Art, Music and Design Proceedings, Washington, DC.
Dannenberg, R. (1984). An on-line algorithm for real-time accompaniment. In: Proceedings of the International Computer Music Conference, 1984 ICMA, Paris, France.
Doucet, A., de Freitas, N., & Gordon, N. J. (Eds.). (2001). Sequential Monte Carlo methods in practice. Springer-Verlag, New York.
Grubb, L., & Dannenberg, R. (1997). A stochastic method of tracking a vocal performer. In: Proceedings of the International Computer Music Conference, 1997 ICMA.
Kapanci, E., & Pfeffer, A. (2005). Signal-to-score music transcription using graphical models. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, UK.
Lang, D., & de Freitas, N. (2004) Beat tracking the graphical model way. Advances in Neural Information Processing Systems 16, Cambridge, MA: MIT Press.
Loscos, A., Cano, P., & Bonada, J. (1999). Score-performance matching using HMMs. In: Proceedings of the International Computer Music Conference, 1999 ICMA.
Orio, N., & Dechelle, F. (2001). Score following using spectral analysis and hidden markov models. In: Proceedings of the International Computer Music Conference, 2001 ICMA.
Puckette, M. (1995). Score following using the sung voice. In: Proceedings of the International Computer Music Conference, 1995 ICMA.
Raphael, C. (1999). Automatic segmentation of acoustic musical signals using hidden markov models. IEEE Trans. on PAMI, 21(4).
Raphael, C. (2002). A hybrid graphical model for rhythmic parsing. Artificial Intelligence, 137(1).
Soulez, F., Rodet, X., & Schwarz, D. (2003). Improving polyphonic and poly-instrumental music to score alignment. In: Proc. Int. Symp. Music Info. Retrieval, 2003 Baltimore MD.
Turetsky, R., & Ellis, D. (2003). Ground-truth transcriptions of real music from Force-Aligned MIDI syntheses. In: Proc. Int. Symp. Music Info. Retrieval, 2003 Baltimore MD.
Vercoe, B. (1984). The synthetic performer in the context of live performance. In: Proceedings of the International Computer Music Conference, 1984 ICMA, Paris, France.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work supported by NSF grants IIS-0113496 and IIS-0534694.
Editor: Gerhard Widmer
Rights and permissions
About this article
Cite this article
Raphael, C. Aligning music audio with symbolic scores using a hybrid graphical model. Mach Learn 65, 389–409 (2006). https://doi.org/10.1007/s10994-006-8415-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8415-3