Document segmentation and classification into musical scores and text

  • Fabrizio Pedersoli
  • George Tzanetakis
Original Paper


A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at under the FreeBSD license to support reproducibility.


Musical score detection Document segmentation Bag of visual word Random voting 



We would like to thank the Social Sciences and Humanities Research Council of Canada and the University of Brescia, Italy, for funding this work.

Supplementary material

10032_2016_271_MOESM1_ESM.jpg (125 kb)
Supplementary material 1 (jpg 124 KB)
10032_2016_271_MOESM2_ESM.jpg (120 kb)
Supplementary material 2 (jpg 120 KB)
10032_2016_271_MOESM3_ESM.jpg (113 kb)
Supplementary material 3 (jpg 113 KB)


  1. 1.
    Antonacopoulos, A., Clausner, C., Papadopoulos, C., Pletschacher, S.: Icdar 2013 competition on historical newspaper layout analysis (hnla 2013). In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1454–1458 (2013). URL
  2. 2.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)Google Scholar
  3. 3.
    Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008)Google Scholar
  4. 4.
    Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi: 10.1145/1815330.1815354
  5. 5.
    Caponetti, L., Castiello, C., Górecki, P.: Document page segmentation using neuro-fuzzy approach. Appl. Soft Comput. 8(1), 118–126 (2008). doi: 10.1016/j.asoc.2006.11.008 CrossRefGoogle Scholar
  6. 6.
    Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi: 10.1109/ICIP.2008.4711927
  7. 7.
    Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009)Google Scholar
  8. 8.
    Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)CrossRefGoogle Scholar
  9. 9.
    Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008). doi: 10.1109/TPAMI.2007.70749 CrossRefGoogle Scholar
  10. 10.
    d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi: 10.1109/ICPR.1994.576283
  11. 11.
    Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003)Google Scholar
  12. 12.
    Fornés, A., Sánchez, G.: Analysis and recognition of music scores. In: Handbook of Document Image Processing and Recognition, pp. 749–774. Springer (2014). doi: 10.1007/978-0-85729-859-1_24
  13. 13.
    Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999)Google Scholar
  14. 14.
    Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. Tutorial at ICCV. (2005)
  15. 15.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Maji, P., Roy, S.: Rough-fuzzy clustering and multiresolution image analysis for text-graphics segmentation. Appl. Soft Comput. 30, 705–721 (2015). doi: 10.1016/j.asoc.2015.01.049 CrossRefGoogle Scholar
  17. 17.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, pp. 197–207. International Society for Optics and Photonics (2003). URL
  18. 18.
    Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)CrossRefGoogle Scholar
  19. 19.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)Google Scholar
  20. 20.
    Pratt, W.K.: Digital Image Processing, 4th edn. Wiley, New York (2007)CrossRefzbMATHGoogle Scholar
  21. 21.
    Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols. Int. J. Doc. Anal. Recognit. (IJDAR) 13(1), 19–31 (2010). doi: 10.1007/s10032-009-0100-1
  22. 22.
    dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi: 10.1109/TPAMI.2009.34 CrossRefGoogle Scholar
  23. 23.
    Shafait, F., Keysers, D., Breuel, T.M.: Performance evaluation and benchmarking of six-page segmentation algorithms. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 941–954 (2008). URL
  24. 24.
    Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992)Google Scholar
  25. 25.
    Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi: 10.1109/DAS.2012.16
  26. 26.
    Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi: 10.1109/ICDAR.2013.154

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversity of VictoriaVictoriaCanada

Personalised recommendations