Skip to main content

From Audio to Music Notation

  • Chapter
  • First Online:
Handbook of Artificial Intelligence for Music

Abstract

The field of Music Information Retrieval (MIR) focuses on creating methods and practices for making sense of music data from various modalities, including audio, video, images, scores and metadata [54]. Within MIR, a core problem which to the day remains open is Automatic Music Transcription (AMT), the process of automatically converting an acoustic music signal into some form of musical notation. The creation of a method for automatically converting musical audio to notation has several uses including but also going beyond MIR: from software for automatic typesetting of audio into staff notation or other music representations, to the use of automatic transcriptions as a descriptor towards the development of systems for music recommendation, to applications for interactive music systems such as automatic music accompaniment, for music education through methods for automatic instrument tutoring, and towards enabling musicological research in sound archives, to name but a few.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Music Information Retrieval Evaluation eXchange (MIREX). Retrieved April 29, 2020, from http://music-ir.org/mirexwiki/.

  2. SyncRWC dataset. Retrieved April 29, 2020, from https://staff.aist.go.jp/m.goto/RWC-MDB/AIST-Annotation/SyncRWC/.

  3. Bay, M., Ehmann, A. F., & Downie, J. S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th International Society for Music Information Retrieval Conference (pp. 315–320). Kobe, Japan.

    Google Scholar 

  4. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. (2005). A tutorial on onset detection of music signals. IEEE Transactions on Audio, Speech, and Language Processing, 13(5), 1035–1047.

    Article  Google Scholar 

  5. Benetos, E., Dixon, S., Duan, Z., & Ewert, S. (2019). Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36(1), 20–30. https://doi.org/10.1109/MSP.2018.2869928.

    Article  Google Scholar 

  6. Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., & Klapuri, A. (2013). Automatic music transcription: Challenges and future directions. Journal of Intelligent Information Systems, 41(3), 407–434. https://doi.org/10.1007/s10844-013-0258-3.

    Article  Google Scholar 

  7. Benetos, E., & Holzapfel, A. (2013). Automatic transcription of Turkish makam music. In 14th International Society for Music Information Retrieval Conference (pp. 355–360). Curitiba, Brazil.

    Google Scholar 

  8. Benetos, E., & Weyde, T. (2015). An efficient temporally-constrained probabilistic model for multiple-instrument music transcription. In 16th International Society for Music Information Retrieval Conference (ISMIR) (pp. 701–707). Malaga, Spain.

    Google Scholar 

  9. Bittner, R., & Bosch, J. J. (2019). Generalised metrics for single-f0 estimation evaluation. In Proceedings of the 20th International Society of Music Information Retrieval Conference, ISMIR (pp. 738–745). Delft, Netherlands.

    Google Scholar 

  10. Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In International Society for Music Information Retrieval Conference (pp. 155–160). Taibei, Taiwan.

    Google Scholar 

  11. Bittner, R. M., McFee, B., & Bello, J. P. (2018). Multitask learning for fundamental frequency estimation in music. arXiv:1809.00381 [cs.SD].

  12. Bittner, R. M., McFee, B., Salamon, J., Li, P., & Bello, J. P. (2017). Deep salience representations for f0 estimation in polyphonic music. In International Society for Music Information Retrieval Conference (pp. 63–70). Suzhou, China.

    Google Scholar 

  13. Böck, S., & Schedl, M. (2012). Polyphonic piano note transcription with recurrent neural networks. In: IEEE International Conference on Audio, Speech and Signal Processing (pp. 121–124). Kyoto, Japan.

    Google Scholar 

  14. Bosch, J. J., Marxer, R., & Gómez, E. (2016). Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music. Journal of New Music Research, 45(2), 101–117.

    Article  Google Scholar 

  15. Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In 29th International Conference on Machine Learning. Edinburgh, Scotland, UK.

    Google Scholar 

  16. Carvalho, R. G. C., & Smaragdis, P. (2017). Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 151–155).

    Google Scholar 

  17. Chen, C. H. (2016). Handbook of Pattern Recognition and Computer Vision (5th ed.). River Edge, NJ, USA: World Scientific Publishing Co., Inc.

    Google Scholar 

  18. Cogliati, A., & Duan, Z. (2017). A metric for music notation transcription accuracy. In Proceedings of the International Society for Music Information Retrieval Conference (pp. 407–413).

    Google Scholar 

  19. Cogliati, A., Duan, Z., & Wohlberg, B. (2017). Piano transcription with convolutional sparse lateral inhibition. IEEE Signal Processing Letters, 24(4), 392–396.

    Article  Google Scholar 

  20. Cogliati, A., Temperley, D., & Duan, Z. (2016). Transcribing human piano performances into music notation. In Proceedings of the International Society for Music Information Retrieval Conference (pp. 758–764).

    Google Scholar 

  21. Daniel, A., Emiya, V., & David, B. (2008). Perceptually-based evaluation of the errors usually made when automatically transcribing music. In International Society for Music Information Retrieval Conference, ISMIR (pp. 550–555).

    Google Scholar 

  22. Duan, Z., Pardo, B., & Zhang, C. (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.

    Article  Google Scholar 

  23. Duan, Z., & Temperley, D. (2001). Note-level music transcription by maximum likelihood sampling.

    Google Scholar 

  24. Emiya, V., Badeau, R., & David, B. (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.

    Article  Google Scholar 

  25. Gòmez, E., & Bonada, J. (2013). Towards computer-assisted flamenco transcription: An experimental comparison of automatic transcription algorithms as applied to a cappella singing. Computer Music Journal, 37(2), 73–90. https://doi.org/10.1162/COMJ_a_00180.

    Article  Google Scholar 

  26. Goto, M., Hashiguchi, H., Nishimura, T., & Oka, R. (2003). RWC music database: Music genre database and musical instrument sound database. In International Conference on Music Information Retrieval. Baltimore, USA.

    Google Scholar 

  27. Hartmann, W. M. (1996). Pitch, periodicity, and auditory organization. The Journal of the Acoustical Society of America, 100(6), 3491–3502. https://doi.org/10.1121/1.417248.

    Article  Google Scholar 

  28. Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raffel, C., et al. (2018). Onsets and frames: Dual-objective piano transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 50–57). Paris, France.

    Google Scholar 

  29. Hawthorne, C., Stasyuk, A., Robers, A., Simon, I., Huang, C.Z.A., Dieleman, S., et al. (2019). Enabling factorized piano music modeling and generation with the maestro dataset. In International Conference on Learning Representations (ICLR).

    Google Scholar 

  30. Humphrey, E. J., Durand, S., & McFee, B. (2018). OpenMIC-2018: An open dataset for multiple instrument recognition. In 19th International Society for Music Information Retrieval Conference (pp. 438–444). Paris, France.

    Google Scholar 

  31. Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2nd ed.). Pearson.

    Google Scholar 

  32. Kelz, R., Böck, S., & Widmer, G. (2019). Multitask learning for polyphonic piano transcription, a case study. In International Workshop on Multilayer Music Representation and Processing (MMRP) (pp. 85–91). https://doi.org/10.1109/MMRP.2019.8665372.

  33. Kelz, R., Dorfer, M., Korzeniowski, F., Böck, S., Arzt, A., & Widmer, G. (2016). On the potential of simple framewise approaches to piano transcription. In Proceedings of International Society for Music Information Retrieval Conference (pp. 475–481).

    Google Scholar 

  34. Kelz, R., & Widmer, G. (2017). An experimental analysis of the entanglement problem in neural-network-based music transcription systems. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio.

    Google Scholar 

  35. Kim, J. W., & Bello, J., Adversarial learning for improved onsets and frames music transcription. In International Society for Music Information Retrieval Conference.

    Google Scholar 

  36. Klapuri, A., & Davy, M. (Eds.). (2006). Signal processing methods for music transcription. New York: Springer.

    Google Scholar 

  37. Luo, Y. J., & Su, L. (2018). Learning domain-adaptive latent representations of music signals using variational autoencoders. In Proceedings of International Society for Music Information Retrieval Conference (pp. 653–660). Paris, France.

    Google Scholar 

  38. Manilow, E., Wichern, G., Seetharaman, P., & Roux, J. L. (2019). Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY.

    Google Scholar 

  39. McLeod, A., & Steedman, M. (2018). Evaluating automatic polyphonic music transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 42–49). Paris, France

    Google Scholar 

  40. McLeod, A., & Yoshii, K. (2019). Evaluating non-aligned musical score transcriptions with mv2h. In Extended Abstract for Late-Breaking/Demo in International Society for Music Information Retrieval Conference, ISMIR.

    Google Scholar 

  41. McVicar, M., Santos-Rodríguez, R., Ni, Y., & Bie, T. D. (2014). Automatic chord estimation from audio: A review of the state of the art. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 556–575. https://doi.org/10.1109/TASLP.2013.2294580.

    Article  Google Scholar 

  42. Molina, E., Barbancho, A. M., Tardòn, L. J., & Barbancho, I. (2014). Evaluation framework for automatic singing transcription. In International Symposium on Music Information Retrieval Conference (pp. 567–572).

    Google Scholar 

  43. Nakamura, E., Benetos, E., Yoshii, K., & Dixon, S., Towards complete polyphonic music transcription: Integrating multi-pitch detection and rhythm quantization.

    Google Scholar 

  44. Nakamura, E., Yoshii, K., & Sagayama, S. (2017). Rhythm transcription of polyphonic piano music based on merged-output hmm for multiple voices. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(4), 794–806.

    Article  Google Scholar 

  45. Nam, J., Ngiam, J., Lee, H., & Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th International Society for Music Information Retrieval Conference (pp. 175–180). Miami, Florida, USA.

    Google Scholar 

  46. Nishikimi, R., Nakamura, E., Fukayama, S., Goto, M., & Yoshii, K. (2019). Automatic singing transcription based on encoder-decoder recurrent neural networks with a weakly-supervised attention mechanism. In Proceedings of IEEE International Conference on Acoustics, Apeech and Signal Processing.

    Google Scholar 

  47. Piszczalski, M., & Galler, B. A. (1977). Automatic music transcription. Computer Music Journal, 1(4), 24–31.

    Google Scholar 

  48. Poliner, G., & Ellis, D. (2007). A discriminative model for polyphonic piano transcription. EURASIP Journal on Advances in Signal Processing, 8, 154–162.

    MATH  Google Scholar 

  49. Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University.

    Google Scholar 

  50. Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: State-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173–190. https://doi.org/10.1007/s13735-012-0004-6.

    Article  Google Scholar 

  51. Román, M. A., Pertusa, A., & Calvo-Zaragoza, J. (2019). A holistic approach to polyphonic music transcription with neural networks. In Proceedings of the 20th International Society for Music Information Retrieval Conference (pp. 731–737). Delft, Netherlands.

    Google Scholar 

  52. Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv:1706.05098.

  53. Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, 31(2), 118–134. https://doi.org/10.1109/MSP.2013.2271648.

    Article  Google Scholar 

  54. Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., et al. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license.

    Google Scholar 

  55. Sigtia, S., Benetos, E., & Dixon, S. (2016). An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5), 927–939. https://doi.org/10.1109/TASLP.2016.2533858.

    Article  Google Scholar 

  56. Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 177–180). New Paltz, USA.

    Google Scholar 

  57. Su, L., & Yang, Y. H. (2015). Escaping from the abyss of manual annotation: New methodology of building polyphonic datasets for automatic music transcription. In International Symposium on Computer Music Multidisciplinary Research.

    Google Scholar 

  58. Thickstun, J., Harchaoui, Z., & Kakade, S. M. (2017). Learning features of music from scratch. In International Conference on Learning Representations (ICLR).

    Google Scholar 

  59. Virtanen, T., Plumbley, M. D., & Ellis, D. P. W. (Eds.). (2018). Computational analysis of sound scenes and events. Springer.

    Google Scholar 

  60. Wang, Q., Zhou, R., & Yan, Y. (2018). Polyphonic piano transcription with a note-based music language model. Applied Sciences, 8(3). https://doi.org/10.3390/app8030470.

  61. Wu, C., Dittmar, C., Southall, C., Vogl, R., Widmer, G., Hockman, J., et al. (2018). A review of automatic drum transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9), 1457–1483. https://doi.org/10.1109/TASLP.2018.2830113.

    Article  Google Scholar 

  62. Xi, Q., Bittner, R. M., Pauwels, J., Ye, X., & Bello, J. P. (2018). Guitarset: A dataset for guitar transcription. In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR (pp. 453–460). Paris, France.

    Google Scholar 

  63. Ycart, A., & Benetos, E. (2017). A study on LSTM networks for polyphonic music sequence modelling. In 18th International Society for Music Information Retrieval Conference (ISMIR) (pp. 421–427).

    Google Scholar 

  64. Ycart, A., & Benetos, E. (2018). A-MAPS: Augmented MAPS dataset with rhythm and key annotations. In 19th International Society for Music Information Retrieval Conference Late Breaking and Demo Papers. Paris, France.

    Google Scholar 

  65. Ycart, A., McLeod, A., Benetos, E., & Yoshii, K. (2019). Blending acoustic and language model predictions for automatic music transcription. In 20th International Society for Music Information Retrieval Conference (ISMIR).

    Google Scholar 

  66. Yu, D., & Deng, L. (Eds.). (2015). Automatic Speech Recognition: A Deep Learning Approach. London: Springer.

    MATH  Google Scholar 

Download references

Acknowledgements

L. Liu is a research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music and is supported by a China Scholarship Council and Queen Mary University of London joint Ph.D. scholarship. The work of E. Benetos was supported by RAEng Research Fellowship RF/128 and a Turing Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lele Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Liu, L., Benetos, E. (2021). From Audio to Music Notation. In: Miranda, E.R. (eds) Handbook of Artificial Intelligence for Music. Springer, Cham. https://doi.org/10.1007/978-3-030-72116-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72116-9_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72115-2

  • Online ISBN: 978-3-030-72116-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics