Abstract
The field of Music Information Retrieval (MIR) focuses on creating methods and practices for making sense of music data from various modalities, including audio, video, images, scores and metadata [54]. Within MIR, a core problem which to the day remains open is Automatic Music Transcription (AMT), the process of automatically converting an acoustic music signal into some form of musical notation. The creation of a method for automatically converting musical audio to notation has several uses including but also going beyond MIR: from software for automatic typesetting of audio into staff notation or other music representations, to the use of automatic transcriptions as a descriptor towards the development of systems for music recommendation, to applications for interactive music systems such as automatic music accompaniment, for music education through methods for automatic instrument tutoring, and towards enabling musicological research in sound archives, to name but a few.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Music Information Retrieval Evaluation eXchange (MIREX). Retrieved April 29, 2020, from http://music-ir.org/mirexwiki/.
SyncRWC dataset. Retrieved April 29, 2020, from https://staff.aist.go.jp/m.goto/RWC-MDB/AIST-Annotation/SyncRWC/.
Bay, M., Ehmann, A. F., & Downie, J. S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th International Society for Music Information Retrieval Conference (pp. 315–320). Kobe, Japan.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2005). A tutorial on onset detection of music signals. IEEE Transactions on Audio, Speech, and Language Processing, 13(5), 1035–1047.
Benetos, E., Dixon, S., Duan, Z., & Ewert, S. (2019). Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36(1), 20–30. https://doi.org/10.1109/MSP.2018.2869928.
Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., & Klapuri, A. (2013). Automatic music transcription: Challenges and future directions. Journal of Intelligent Information Systems, 41(3), 407–434. https://doi.org/10.1007/s10844-013-0258-3.
Benetos, E., & Holzapfel, A. (2013). Automatic transcription of Turkish makam music. In 14th International Society for Music Information Retrieval Conference (pp. 355–360). Curitiba, Brazil.
Benetos, E., & Weyde, T. (2015). An efficient temporally-constrained probabilistic model for multiple-instrument music transcription. In 16th International Society for Music Information Retrieval Conference (ISMIR) (pp. 701–707). Malaga, Spain.
Bittner, R., & Bosch, J. J. (2019). Generalised metrics for single-f0 estimation evaluation. In Proceedings of the 20th International Society of Music Information Retrieval Conference, ISMIR (pp. 738–745). Delft, Netherlands.
Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In International Society for Music Information Retrieval Conference (pp. 155–160). Taibei, Taiwan.
Bittner, R. M., McFee, B., & Bello, J. P. (2018). Multitask learning for fundamental frequency estimation in music. arXiv:1809.00381 [cs.SD].
Bittner, R. M., McFee, B., Salamon, J., Li, P., & Bello, J. P. (2017). Deep salience representations for f0 estimation in polyphonic music. In International Society for Music Information Retrieval Conference (pp. 63–70). Suzhou, China.
Böck, S., & Schedl, M. (2012). Polyphonic piano note transcription with recurrent neural networks. In: IEEE International Conference on Audio, Speech and Signal Processing (pp. 121–124). Kyoto, Japan.
Bosch, J. J., Marxer, R., & Gómez, E. (2016). Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music. Journal of New Music Research, 45(2), 101–117.
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In 29th International Conference on Machine Learning. Edinburgh, Scotland, UK.
Carvalho, R. G. C., & Smaragdis, P. (2017). Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 151–155).
Chen, C. H. (2016). Handbook of Pattern Recognition and Computer Vision (5th ed.). River Edge, NJ, USA: World Scientific Publishing Co., Inc.
Cogliati, A., & Duan, Z. (2017). A metric for music notation transcription accuracy. In Proceedings of the International Society for Music Information Retrieval Conference (pp. 407–413).
Cogliati, A., Duan, Z., & Wohlberg, B. (2017). Piano transcription with convolutional sparse lateral inhibition. IEEE Signal Processing Letters, 24(4), 392–396.
Cogliati, A., Temperley, D., & Duan, Z. (2016). Transcribing human piano performances into music notation. In Proceedings of the International Society for Music Information Retrieval Conference (pp. 758–764).
Daniel, A., Emiya, V., & David, B. (2008). Perceptually-based evaluation of the errors usually made when automatically transcribing music. In International Society for Music Information Retrieval Conference, ISMIR (pp. 550–555).
Duan, Z., Pardo, B., & Zhang, C. (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.
Duan, Z., & Temperley, D. (2001). Note-level music transcription by maximum likelihood sampling.
Emiya, V., Badeau, R., & David, B. (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.
Gòmez, E., & Bonada, J. (2013). Towards computer-assisted flamenco transcription: An experimental comparison of automatic transcription algorithms as applied to a cappella singing. Computer Music Journal, 37(2), 73–90. https://doi.org/10.1162/COMJ_a_00180.
Goto, M., Hashiguchi, H., Nishimura, T., & Oka, R. (2003). RWC music database: Music genre database and musical instrument sound database. In International Conference on Music Information Retrieval. Baltimore, USA.
Hartmann, W. M. (1996). Pitch, periodicity, and auditory organization. The Journal of the Acoustical Society of America, 100(6), 3491–3502. https://doi.org/10.1121/1.417248.
Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raffel, C., et al. (2018). Onsets and frames: Dual-objective piano transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 50–57). Paris, France.
Hawthorne, C., Stasyuk, A., Robers, A., Simon, I., Huang, C.Z.A., Dieleman, S., et al. (2019). Enabling factorized piano music modeling and generation with the maestro dataset. In International Conference on Learning Representations (ICLR).
Humphrey, E. J., Durand, S., & McFee, B. (2018). OpenMIC-2018: An open dataset for multiple instrument recognition. In 19th International Society for Music Information Retrieval Conference (pp. 438–444). Paris, France.
Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2nd ed.). Pearson.
Kelz, R., Böck, S., & Widmer, G. (2019). Multitask learning for polyphonic piano transcription, a case study. In International Workshop on Multilayer Music Representation and Processing (MMRP) (pp. 85–91). https://doi.org/10.1109/MMRP.2019.8665372.
Kelz, R., Dorfer, M., Korzeniowski, F., Böck, S., Arzt, A., & Widmer, G. (2016). On the potential of simple framewise approaches to piano transcription. In Proceedings of International Society for Music Information Retrieval Conference (pp. 475–481).
Kelz, R., & Widmer, G. (2017). An experimental analysis of the entanglement problem in neural-network-based music transcription systems. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio.
Kim, J. W., & Bello, J., Adversarial learning for improved onsets and frames music transcription. In International Society for Music Information Retrieval Conference.
Klapuri, A., & Davy, M. (Eds.). (2006). Signal processing methods for music transcription. New York: Springer.
Luo, Y. J., & Su, L. (2018). Learning domain-adaptive latent representations of music signals using variational autoencoders. In Proceedings of International Society for Music Information Retrieval Conference (pp. 653–660). Paris, France.
Manilow, E., Wichern, G., Seetharaman, P., & Roux, J. L. (2019). Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY.
McLeod, A., & Steedman, M. (2018). Evaluating automatic polyphonic music transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 42–49). Paris, France
McLeod, A., & Yoshii, K. (2019). Evaluating non-aligned musical score transcriptions with mv2h. In Extended Abstract for Late-Breaking/Demo in International Society for Music Information Retrieval Conference, ISMIR.
McVicar, M., Santos-RodrÃguez, R., Ni, Y., & Bie, T. D. (2014). Automatic chord estimation from audio: A review of the state of the art. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 556–575. https://doi.org/10.1109/TASLP.2013.2294580.
Molina, E., Barbancho, A. M., Tardòn, L. J., & Barbancho, I. (2014). Evaluation framework for automatic singing transcription. In International Symposium on Music Information Retrieval Conference (pp. 567–572).
Nakamura, E., Benetos, E., Yoshii, K., & Dixon, S., Towards complete polyphonic music transcription: Integrating multi-pitch detection and rhythm quantization.
Nakamura, E., Yoshii, K., & Sagayama, S. (2017). Rhythm transcription of polyphonic piano music based on merged-output hmm for multiple voices. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(4), 794–806.
Nam, J., Ngiam, J., Lee, H., & Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th International Society for Music Information Retrieval Conference (pp. 175–180). Miami, Florida, USA.
Nishikimi, R., Nakamura, E., Fukayama, S., Goto, M., & Yoshii, K. (2019). Automatic singing transcription based on encoder-decoder recurrent neural networks with a weakly-supervised attention mechanism. In Proceedings of IEEE International Conference on Acoustics, Apeech and Signal Processing.
Piszczalski, M., & Galler, B. A. (1977). Automatic music transcription. Computer Music Journal, 1(4), 24–31.
Poliner, G., & Ellis, D. (2007). A discriminative model for polyphonic piano transcription. EURASIP Journal on Advances in Signal Processing, 8, 154–162.
Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University.
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: State-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173–190. https://doi.org/10.1007/s13735-012-0004-6.
Román, M. A., Pertusa, A., & Calvo-Zaragoza, J. (2019). A holistic approach to polyphonic music transcription with neural networks. In Proceedings of the 20th International Society for Music Information Retrieval Conference (pp. 731–737). Delft, Netherlands.
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv:1706.05098.
Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, 31(2), 118–134. https://doi.org/10.1109/MSP.2013.2271648.
Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., et al. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license.
Sigtia, S., Benetos, E., & Dixon, S. (2016). An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5), 927–939. https://doi.org/10.1109/TASLP.2016.2533858.
Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 177–180). New Paltz, USA.
Su, L., & Yang, Y. H. (2015). Escaping from the abyss of manual annotation: New methodology of building polyphonic datasets for automatic music transcription. In International Symposium on Computer Music Multidisciplinary Research.
Thickstun, J., Harchaoui, Z., & Kakade, S. M. (2017). Learning features of music from scratch. In International Conference on Learning Representations (ICLR).
Virtanen, T., Plumbley, M. D., & Ellis, D. P. W. (Eds.). (2018). Computational analysis of sound scenes and events. Springer.
Wang, Q., Zhou, R., & Yan, Y. (2018). Polyphonic piano transcription with a note-based music language model. Applied Sciences, 8(3). https://doi.org/10.3390/app8030470.
Wu, C., Dittmar, C., Southall, C., Vogl, R., Widmer, G., Hockman, J., et al. (2018). A review of automatic drum transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9), 1457–1483. https://doi.org/10.1109/TASLP.2018.2830113.
Xi, Q., Bittner, R. M., Pauwels, J., Ye, X., & Bello, J. P. (2018). Guitarset: A dataset for guitar transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 453–460). Paris, France.
Ycart, A., & Benetos, E. (2017). A study on LSTM networks for polyphonic music sequence modelling. In 18th International Society for Music Information Retrieval Conference (ISMIR) (pp. 421–427).
Ycart, A., & Benetos, E. (2018). A-MAPS: Augmented MAPS dataset with rhythm and key annotations. In 19th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers. Paris, France.
Ycart, A., McLeod, A., Benetos, E., & Yoshii, K. (2019). Blending acoustic and language model predictions for automatic music transcription. In 20th International Society for Music Information Retrieval Conference (ISMIR).
Yu, D., & Deng, L. (Eds.). (2015). Automatic Speech Recognition: A Deep Learning Approach. London: Springer.
Acknowledgements
L. Liu is a research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music and is supported by a China Scholarship Council and Queen Mary University of London joint Ph.D. scholarship. The work of E. Benetos was supported by RAEng Research Fellowship RF/128 and a Turing Fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Liu, L., Benetos, E. (2021). From Audio to Music Notation. In: Miranda, E.R. (eds) Handbook of Artificial Intelligence for Music. Springer, Cham. https://doi.org/10.1007/978-3-030-72116-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-72116-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72115-2
Online ISBN: 978-3-030-72116-9
eBook Packages: Computer ScienceComputer Science (R0)