International Journal on Digital Libraries

, Volume 20, Issue 1, pp 49–59 | Cite as

Heuristic and supervised approaches to handwritten annotation extraction for musical score images

  • Eamonn BellEmail author
  • Laurent Pugin


Performers’ copies of musical scores are typically rich in handwritten annotations, which capture historical and institutional performance practices. The development of interactive interfaces to explore digital archives of these scores and the systematic investigation of their meaning and function will be facilitated by the automatic extraction of handwritten score annotations. We present several approaches to the extraction of handwritten annotations of arbitrary content from digitized images of musical scores. First, we show promising results in certain contexts when using simple unsupervised clustering techniques to identify handwritten annotations in conductors’ scores. Next, we compare annotated scores to unannotated copies and use a printed sheet music comparison tool, Aruspix, to recover handwritten annotations as additions to the clean copy. Using both of these techniques in a combined annotation pipeline qualitatively improves the recovery of handwritten annotations. Recent work has shown the effectiveness of reframing classical optical musical recognition tasks as supervised machine learning classification tasks. In the same spirit, we pose the problem of handwritten annotation extraction as a supervised pixel classification task, where the feature space for the learning task is derived from the intensities of neighboring pixels. After an initial investment of time required to develop dependable training data, this approach can reliably extract annotations for entire volumes of score images without further supervision. These techniques are demonstrated using a sample of orchestral scores annotated by professional conductors of the New York Philharmonic Orchestra. Handwritten annotation extraction in musical scores has applications to the systematic investigation of score annotation practices by performers, annotator attribution, and to the interactive presentation of annotated scores, which we briefly discuss.


Annotation extraction Image processing Color clustering Supervised pixel classification Orchestral scores Conducting Image superimposition 



The authors wish to thank Barbara Haws at the New York Philharmonic Archives and Mitchell Brodsky for their technical support and encouragement and the anonymous reviewers for their feedback and suggestions. Leon Levy Digital Archive, New York Philharmonic, contributed to original score image courtesy.


  1. 1.
    Calvo-Zaragoza, J., Mic, L., Oncina, J.: Music staff removal with supervised pixel classification. Int. J. Doc. Anal. Recognit. (IJDAR) 19(3), 211–219 (2016). CrossRefGoogle Scholar
  2. 2.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 735–766 (2008)CrossRefGoogle Scholar
  4. 4.
    Fan, K.C., Wang, L.S., Tu, Y.T.: Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recognit. 31(9), 1275–1284 (1998)CrossRefGoogle Scholar
  5. 5.
    Farooq, F., Sridharan, K., Govindaraju, V.: Identifying handwritten text in mixed documents. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 1142–1145 (2006).
  6. 6.
    Guo, J.K., Ma, M.Y.: Separating handwritten material from machine printed text using hidden Markov models. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 439–443 (2001).
  7. 7.
    Hankinson, A., Burgoyne, J.A., Vigliensoni, G., Porter, A., Thompson, J., Liu, W., Chiu, R., Fujinaga, I.: Digital document image retrieval using optical music recognition. In: Proceedings of the 13th ISMIR Conference, Porto, Portugal, pp. 577–582, 8–12 Oct 2012Google Scholar
  8. 8.
    IIIF Consortium (2017) IIIF Presentation API v. 2.1.1. Online. Accessed 25 June 2018
  9. 9.
    Limpaecher, A., Feltman, N., Treuille, A., Cohen, M.: Real-time drawing assistance through crowdsourcing. ACM Trans. Graphics 32(4), 1 (2013). CrossRefGoogle Scholar
  10. 10.
    McLaren, K.: The development of the cie 1976 (l * a * b *) uniform colour space and colour-difference formula. J. Soc. Dye. Colour. 92(9), 338–341 (1976). CrossRefGoogle Scholar
  11. 11.
    Nakai, T., Kise, K., Iwamura, M.: A method of annotation extraction from paper documents using alignment based on local arrangements of feature points. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, vol. 1, pp. 23–27. IEEE (2007)Google Scholar
  12. 12.
    Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)zbMATHGoogle Scholar
  13. 13.
    Papandreou, G., Chen, L.C., Murphy, K., Yuille, A.L.: Weakly-and semi-supervised learning of a DCNN for semantic image segmentation (2015). arXiv preprint arXiv:1502.02734
  14. 14.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Handwritten text separation from annotated machine printed documents using Markov random fields. Int. J. Doc. Anal. Recognit. (IJDAR) 16(1), 1–16 (2011). Google Scholar
  16. 16.
    Pugin, L.: Aruspix: An automatic source-comparison system. In: Hewlett, W.B., Selfridge-Field, E. (eds.) Music Analysis East and West, Computing in Musicology, vol. 14, pp. 49–60. MIT Press, Cambridge (2006)Google Scholar
  17. 17.
    Roland, P., Kepper, J.: Music encoding initiative guidelines (v. 3.0.0) (2016). Accessed 25 June 2018.
  18. 18.
    Violante, S., Smith, R., Reiss, M.: A computationally efficient technique for discriminating between hand-written and printed text. In: IEEE Colloquium on Document Image Processing and Multimedia Environments, pp. 17–1. IET (1995)Google Scholar
  19. 19.
    Weigl, D.M., Page, K.R.: A framework for distributed semantic annotation of musical score: take it to the bridge!. In: Proceedings of the 18th ISMIR Conference, Suzhou, China, pp. 221–228. The International Society of Music Information Retrieval (ISMIR), 23–27 Oct 2017Google Scholar
  20. 20.
    Zagoris, K., Pratikakis, I., Antonacopoulos, A., Gatos, B., Papamarkos, N.: Distinction between handwritten and machine-printed text based on the bag of visual words model. Pattern Recognit. 47(3), 1051–1062 (2014). CrossRefGoogle Scholar
  21. 21.
    Zagoris, K., Pratikakis, I., Gatos, B.: Segmentation-based historical handwritten word spotting using document-specific local features. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 9–14. IEEE (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of MusicColumbia UniversityNew YorkUSA
  2. 2.RISM (Switzerland)Bern 6Switzerland

Personalised recommendations