Skip to main content

Optical Music Recognition by Long Short-Term Memory Networks

  • Conference paper
  • First Online:
Book cover Graphics Recognition. Current Trends and Evolutions (GREC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11009))

Included in the following conference series:

  • 478 Accesses

Abstract

Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.neuratron.com/photoscore.htm.

  2. 2.

    http://www.visiv.co.uk/.

  3. 3.

    Symbols appear with specific duration (rhythm) and pitch (melody).

  4. 4.

    https://www.midi.org/.

  5. 5.

    http://www.musicxml.com/.

  6. 6.

    http://music-encoding.org/.

  7. 7.

    http://www.rism.info/.

  8. 8.

    L = Line; S=Space; L1 is the bottom line on the staff and S1 is the space between line 1 and 2.

  9. 9.

    http://pytorch.org/.

  10. 10.

    http://www.neuratron.com/photoscore.html.

References

  1. Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A.R.S., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. IJMIR 1(3), 173–190 (2012)

    Google Scholar 

  2. Bainbridge, D., Bell, T.: The challenge of optical music recognition. Comput. Hum. 35(2), 95–121 (2001)

    Article  Google Scholar 

  3. Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 749–774. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_24

    Chapter  Google Scholar 

  4. Pinto, T., Rebelo, A., Giraldi, G.A., Cardoso, J.S.: Music score binarization based on domain knowledge. Pattern Recognit. Image Anal. 2011, 700–708 (2011)

    Article  Google Scholar 

  5. Gallego, A., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert. Syst. Appl. 89, 138–48 (2017)

    Article  Google Scholar 

  6. Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 12th International Workshop on Graphics Recognition (GREC), pp. 35–36 (2017)

    Google Scholar 

  7. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS, pp. 545–552 (2009)

    Google Scholar 

  8. Campos, V.B., Calvo-Zaragoza, J., Toselli, A.H., Vidal-Ruiz, E.: Sheet music statistical layout analysis. In: ICFHR, pp. 313–318 (2016)

    Google Scholar 

  9. Burgoyne, J.A., Ouyang, Y., Himmelman, T., Devaney, J., Pugin, L., Fujinaga, I.: Lyric extraction and recognition on digital images of early music sources. In: ISMIR, pp. 723–727 (2009)

    Google Scholar 

  10. Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  11. Pedersoli, F., Tzanetakis, G.: Document segmentation and classification into musical scores and text. Int. J. Doc. Anal. Recognit. (IJDAR) 19(4), 289–304 (2016)

    Article  Google Scholar 

  12. Fornés, A., Lladós, J., Sánchez, G., Karatzas, D.: Rotation invariant hand drawn symbol recognition based on a dynamic time warping model. IJDAR 13(3), 229–241 (2010)

    Article  Google Scholar 

  13. Escalera, S., Fornés, A., Pujol, O., Radeva, P., Sánchez, G., Lladós, J.: Blurred Shape Model for binary and grey-level symbol recognition. Pattern Recognit. Lett. 30(15), 1424–1433 (2009)

    Article  Google Scholar 

  14. Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols: a comparative study. IJDAR 13(1), 19–31 (2010)

    Article  Google Scholar 

  15. Rebelo, A., Tkaczuk, J., Sousa, R., Cardoso, J.S.: Metric learning for music symbol recognition. In: 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 2, pp. 106–111, December 2011

    Google Scholar 

  16. Coüasnon, B., Rétif, B.: Using a grammar for a reliable full score recognition system. In: ICMC (1995)

    Google Scholar 

  17. Pugin, L.: Optical music recognitoin of early typographic prints using hidden markov models. In: ISMIR (2006)

    Google Scholar 

  18. Pugin, L., Burgoyne, J.A., Fujinaga, I.: Map adaptation to improve optical music recognition of early music documents using hidden markov models. In: ISMIR (2007)

    Google Scholar 

  19. Pinto, J.C., Vieira, P., Sousa, J.M.: A new graph-like classification method applied to ancient handwritten musical symbols. Doc. Anal. Recognit. 6(1), 10–22 (2003)

    Article  Google Scholar 

  20. Choi, K.-Y., Coüasnon, B., Ricquebourg, Y., Zanibbi, R.: Bootstrapping samples of accidentals in dense piano scores for CNN-based detection. In: 12th International Workshop on Graphics Recognition (GREC), pp. 19–20 (2017)

    Google Scholar 

  21. Dorfer, M., Hajič, J., Widmer, G.: On the potential of fully convolutional neural networks for musical symbol detection. In: 12th International Workshop on Graphics Recognition (GREC), pp. 53–54 (2017)

    Google Scholar 

  22. Baró, A., Riba, P., Fornés, A.: Towards the recognition of compound music notes in handwritten music scores. In: ICFHR, pp. 465–470, October 2016

    Google Scholar 

  23. Matsushima, T., Ohteru, S., Hashimoto, S.: An integrated music information processing system: PSB-er. In: Proceedings of the International Computer Music Conference, pp. 191–198 (1989)

    Google Scholar 

  24. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org

  25. Owens, A., Isola, P., McDermott, J.H., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. CoRR, vol. abs/1512.08512 (2015)

    Google Scholar 

  26. Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. CoRR, vol. abs/1610.09001 (2016)

    Google Scholar 

  27. Sübakan, Y.C., Smaragdis, P.: Diagonal RNNs in symbolic music modeling. CoRR, vol. abs/1704.05420 (2017)

    Google Scholar 

  28. Kalingeri, V., Grandhe, S.: Music generation with deep learning. CoRR, vol. abs/1612.04928 (2016)

    Google Scholar 

  29. Pascanu, R., Gülçehre, Ç., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. CoRR, vol. abs/1312.6026 (2013)

    Google Scholar 

  30. van der Wel, E., Ullrich, K.: Optical music recognition with convolutional sequence-to-sequence models. CoRR, vol. abs/1707.04877 (2017)

    Google Scholar 

  31. Calvo-Zaragoza, J., Valero-Mas, J.J., Pertusa, A.: End-to-end optical music recognition using neural networks. In: ISMIR (2017)

    Google Scholar 

  32. Pacha, A., Eidenberger, H.M.: Towards self-learning optical music recognition. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 795–800 (2017)

    Google Scholar 

  33. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  34. Frinken, V., Bunke, H.: Continuous handwritten script recognition. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 391–425. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_12

    Chapter  Google Scholar 

Download references

Acknowledgment

This work has been partially supported by the Spanish project TIN2015-70924-C2-2-R, the Ramon y Cajal Fellowship RYC-2014-16831, the CERCA Program/Generalitat de Catalunya, FPU fellowship FPU15/06264 from the Spanish Ministerio de Educación, Cultura y Deporte, the social Sciences and Humanities Research Council of Canada and the FI fellowship AGAUR 2018 FI_B 00546 of the Generalitat de Catalunya. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnau Baró .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A. (2018). Optical Music Recognition by Long Short-Term Memory Networks. In: Fornés, A., Lamiroy, B. (eds) Graphics Recognition. Current Trends and Evolutions. GREC 2017. Lecture Notes in Computer Science(), vol 11009. Springer, Cham. https://doi.org/10.1007/978-3-030-02284-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02284-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02283-9

  • Online ISBN: 978-3-030-02284-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics