Hybrid hidden Markov models and artificial neural networks for handwritten music recognition in mensural notation
- 65 Downloads
In this paper, we present a hybrid approach using hidden Markov models (HMM) and artificial neural networks to deal with the task of handwritten Music Recognition in mensural notation. Previous works have shown that the task can be addressed with Gaussian density HMMs that can be trained and used in an end-to-end manner, that is, without prior segmentation of the symbols. However, the results achieved using that approach are not sufficiently accurate to be useful in practice. In this work, we hybridize HMMs with deep multilayer perceptrons (MLPs), which lead to remarkable improvements in optical symbol modeling. Moreover, this hybrid architecture maintains important advantages of HMMs such as the ability to properly model variable-length symbol sequences through segmentation-free training, and the simplicity and robustness of combining optical models with N-gram language models, which provide statistical a priori information about regularities in musical symbol concatenation observed in the training data. The results obtained with the proposed hybrid MLP-HMM approach outperform previous works by a wide margin, achieving symbol-level error rates around 26%, as compared with about 40% reported in previous works.
KeywordsHandwritten music recognition Mensural notation Hidden Markov models Artificial neural networks N-gram Language Models
Compliance with ethical standards
Conflict of interest
Authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 4.Bosch V, Calvo-Zaragoza J, Toselli AH, Vidal-Ruiz E (2016) Sheet music statistical layout analysis. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 313–318Google Scholar
- 7.Calvo-Zaragoza J, Toselli AH, Vidal E (2016) Early handwritten music recognition with hidden markov models. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 319–324Google Scholar
- 8.Calvo-Zaragoza J, Toselli AH, Vidal E (2017) Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th International conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 13–15 Aug 2017, pp 1081–1086Google Scholar
- 11.Fujinaga I, Hankinson A, Cumming JE (2014) Introduction to SIMSSA (single interface for music score searching and analysis). In: Proceedings of the 1st international workshop on digital libraries for musicology, DLfM@JCDL 2014, London, UK, 12 Sept 2014, pp 1–3Google Scholar
- 15.Hankinson A, Burgoyne JA, Vigliensoni G, Fujinaga I (2012) Creating a large-scale searchable digital collection from printed music materials. In: Proceedings of the 21st world wide web conference, WWW 2012, Lyon, France, 16–20 April 2012 (Companion Volume), pp 903–908Google Scholar
- 16.Jelinek F (1998) Statistical methods for speech recognition. MIT Press, CambridgeGoogle Scholar
- 17.Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: International conference on acoustics, speech, and signal processing, ICASSP ’95, Detroit, Michigan, USA, 08–12 May 1995, pp 181–184Google Scholar
- 18.Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: International conference on information science and security (ICISS), 2016. IEEE, pp 1–5Google Scholar
- 21.Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, University of CambridgeGoogle Scholar
- 22.Pugin L (2006) Optical music recognition of early typographic prints using hidden markov models. In: Proceedings of the ISMIR 2006, 7th international conference on music information retrieval, Victoria, Canada, Oct 8–12, pp 53–56Google Scholar
- 23.Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle RiverGoogle Scholar
- 26.Toselli AH, Juan A, Vidal E (2004) Spontaneous handwriting recognition and classification. In: 17th International conference on pattern recognition, ICPR 2004, Cambridge, UK, 23–26 August 2004, pp 433–436Google Scholar
- 28.Toselli AH, Romero V, Vidal E (2011) Alignment between text images and their transcripts for handwritten documents. Language Technology for Cultural Heritage, pp 23–37Google Scholar
- 31.Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, et al (2015) The HTK book, vol 3.5. Entropic Cambridge Research Laboratory, CambridgeGoogle Scholar