Tandem hidden Markov models using deep belief networks for offline handwriting recognition

Abstract

Unconstrained offline handwriting recognition is a challenging task in the areas of document analysis and pattern recognition. In recent years, to sufficiently exploit the supervisory information hidden in document images, much effort has been made to integrate multi-layer perceptrons (MLPs) in either a hybrid or a tandem fashion into hidden Markov models (HMMs). However, due to the weak learnability of MLPs, the learnt features are not necessarily optimal for subsequent recognition tasks. In this paper, we propose a deep architecture-based tandem approach for unconstrained offline handwriting recognition. In the proposed model, deep belief networks are adopted to learn the compact representations of sequential data, while HMMs are applied for (sub-)word recognition. We evaluate the proposed model on two publicly available datasets, i.e., RIMES and IFN/ENIT, which are based on Latin and Arabic languages respectively, and one dataset collected by ourselves called Devanagari (an Indian script). Extensive experiments show the advantage of the proposed model, especially over the MLP-HMMs tandem approaches.

This is a preview of subscription content, access via your institution.

References

  1. Augustin, E., Carré, M., Grosicki, E., et al., 2006. RIMES evaluation campaign for handwritten mail processing. Proc. Int. Workshop on Frontiers in Handwriting Recognition, p.231–235.

    Google Scholar 

  2. Baum, L.E., Petrie, T., Soules, G., et al., 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist., 41(1): 164–171.

    MathSciNet  Article  MATH  Google Scholar 

  3. Bertolami, R., Bunke, H., 2008. Hidden Markov modelbased ensemble methods for offline handwritten text line recognition. Patt. Recog., 41(11): 3452–3460. http://dx.doi.org/10.1016/j.patcog.2008.04.003

    Article  MATH  Google Scholar 

  4. Bianne-Bernard, A.L., Menasri, F., Mohamad, R.A.H., et al., 2011. Dynamic and contextual information in HMM modeling for handwritten word recognition. IEEE Trans. Patt. Anal. Mach. Intell., 33(10): 2066–2080. http://dx.doi.org/10.1109/TPAMI.2011.22

    Article  Google Scholar 

  5. Bourlard, H.A., Morgan, N., 1994. Connectionist Speech Recognition: a Hybrid Approach. Springer US, USA.

    Book  Google Scholar 

  6. Bunke, H., 2003. Recognition of cursive Roman handwriting: past, present and future. Proc. 7th Int. Conf. on Document Analysis and Recognition, p.448–459. http://dx.doi.org/10.1109/ICDAR.2003.1227707

    Google Scholar 

  7. Dahl, G., Yu, D., Deng, L., et al., 2011. Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4688–4691.

    Google Scholar 

  8. Deselaers, T., Hasan, S., Bender, O., et al., 2009. A deep learning approach to machine transliteration. Proc. 4th Workshop on Statistical Machine Translation, p.233–241.

    Google Scholar 

  9. Dreuw, P., Heigold, G., Ney, H., 2009. Confidence-based discriminative training for model adaptation in offline Arabic handwriting recognition. Proc. 10th Int. Conf. on Document Analysis and Recognition, p.596–600. http://dx.doi.org/10.1109/ICDAR.2009.116

    Google Scholar 

  10. Dreuw, P., Doetsch, P., Plahl, C., et al., 2011a. Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: a comparison for offline handwriting recognition. Proc. 18th Int. Conf. on Image Processing, p.3541–3544. http://dx.doi.org/10.1109/ICIP.2011.6116480

    Google Scholar 

  11. Dreuw, P., Heigold, G., Ney, H., 2011b. Confidence-and margin-based MMI/MPE discriminative training for offline handwriting recognition. Int. J. Doc. Anal. Recog., 14: 273–288. http://dx.doi.org/10.1007/s10032-011-0160-x

    Article  Google Scholar 

  12. El-Yacoubi, A., Gilloux, M., Sabourin, R., et al., 1999. An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Trans. Patt. Anal. Mach. Intell., 21(8): 752–760. http://dx.doi.org/10.1109/34.784288

    Article  Google Scholar 

  13. Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., et al., 2011. Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Patt. Anal. Mach. Intell., 33(4): 767–779. http://dx.doi.org/10.1109/TPAMI.2010.141

    Article  Google Scholar 

  14. Fujisawa, H., 2008. Forty years of research in character and document recognition—an industrial perspective. Patt. Recog., 41: 2435–2446. http://dx.doi.org/10.1016/j.patcog.2008.03.015

    Article  Google Scholar 

  15. Graves, A., Schmidhuber, J., 2008. Offline handwriting recognition with multidimensional recurrent neural networks. Proc. 21st Int. Conf. on Neural Information Processing Systems, p.545–552.

    Google Scholar 

  16. Graves, A., Liwicki, M., Fernández, S., et al., 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Patt. Anal. Mach. Intell., 31(5): 855–868. http://dx.doi.org/10.1109/TPAMI.2008.137

    Article  Google Scholar 

  17. Grosicki, E., El Abed, H., 2009. ICDAR 2009 handwriting recognition competition. Proc. 10th Int. Conf. on Document Analysis and Recognition, p.1398–1402. http://dx.doi.org/10.1109/ICDAR.2009.184

    Google Scholar 

  18. Haykin, S., 1998. Neural Networks: a Comprehensive Foundation. Prentice Hall, USA.

    MATH  Google Scholar 

  19. Hermansky, H., Ellis, D.P.W., Sharma, S., 2000. Tandem connectionist feature extraction for conventional HMM systems. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.1–4. http://dx.doi.org/10.1109/ICASSP.2000.862024

    Google Scholar 

  20. Hinton, G.E., 2002. Training products of experts by minimizing contrastive divergence. Neur. Comput., 14(8): 1771–1800. http://dx.doi.org/10.1162/089976602760128018

    Article  MATH  Google Scholar 

  21. Hinton, G.E., Osindero, S., Teh, Y.W., 2006. A fast learning algorithm for deep belief nets. Neur. Comput., 18(7): 1527–1554. http://dx.doi.org/10.1162/neco.2006.18.7.1527

    MathSciNet  Article  MATH  Google Scholar 

  22. Kessentini, Y., Paquet, T., Benhamadou, A., 2008. A multistream HMM-based approach for off-line multi-script handwritten word recognition. Proc. Int. Conf. on Frontiers in Handwriting Recognition, p.1–6.

    Google Scholar 

  23. Kittler, J., Young, P.C., 1973. A new approach to feature selection based on the Karhunen-Loeve expansion. Patt. Recog., 5(4): 335–352. http://dx.doi.org/10.1016/0031-3203(73)90025-3

    MathSciNet  Article  Google Scholar 

  24. Kozielski, M., Doetsch, P., Ney, H., 2013. Improvements in RWTH’s system for off-line handwriting recognition. Proc. 12th Int. Conf. on Document Analysis and Recognition, p.935–939. http://dx.doi.org/10.1109/ICDAR.2013.190

    Google Scholar 

  25. Margner, V., El Abed, H., 2010. ICFHR 2010—Arabic handwriting recognition competition. Proc. Int. Conf. on Frontiers in Handwriting Recognition, p.709–714. http://dx.doi.org/10.1109/ICFHR.2010.115

    Google Scholar 

  26. Marinai, S., Gori, M., Soda, G., 2005. Artificial neural networks for document analysis and recognition. IEEE Trans. Patt. Anal. Mach. Intell., 27(1): 23–35. http://dx.doi.org/10.1109/TPAMI.2005.4

    Article  Google Scholar 

  27. Marti, U.V., Bunke, H., 2001. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. J. Patt. Recog. Artif. Intell., 15(1): 65–90. http://dx.doi.org/10.1142/S0218001401000848

    Article  Google Scholar 

  28. Mohamad, R.A.H., Likforman-Sulem, L., Mokbel, C., 2009. Combining slanted-frame classifiers for improved HMMbased Arabic handwriting recognition. IEEE Trans. Patt. Anal. Mach. Intell., 31(7): 1165–1177. http://dx.doi.org/10.1109/TPAMI.2008.136

    Article  Google Scholar 

  29. Mohamed, A.R., Dahl, G., Hinton, G., 2009. Deep belief networks for phone recognition. Proc. NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, p.1–9.

    Google Scholar 

  30. Mohamed, A.R., Dahl, G., Hinton, G., 2012. Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20(1): 14–22. http://dx.doi.org/10.1109/TASL.2011.2109382

    Article  Google Scholar 

  31. Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern., 9(1): 62–66. http://dx.doi.org/10.1109/TSMC.1979.4310076

    Article  Google Scholar 

  32. Pal, U., Chaudhuri, B.B., 2004. Indian script character recognition: a survey. Patt. Recog., 37(9): 1887–1899. http://dx.doi.org/10.1016/j.patcog.2004.02.003

    Article  Google Scholar 

  33. Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2): 257–286. http://dx.doi.org/10.1109/5.18626

    Article  Google Scholar 

  34. Renals, S., Morgan, N., Bourlard, H., et al., 1994. Connectionist probability estimators in HMM speech recognition. IEEE Trans. Speech Audio Process., 2(1): 161–174. http://dx.doi.org/10.1109/89.260359

    Article  Google Scholar 

  35. Rodríguez, J.A., Perronnin, F., 2008. Local gradient histogram features for word spotting in unconstrained handwritten documents. Proc. Int. Conf. on Frontiers in Handwriting Recognition, p.7–12.

    Google Scholar 

  36. Schenk, J., Rigoll, G., 2006. Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. Proc. 10th Int. Workshop on Frontiers in Handwriting Recognition, p.1–5.

    Google Scholar 

  37. Senior, A., Robinson, A.J., 1998. An off-line cursive handwriting recognition system. IEEE Trans. Patt. Anal. Mach. Intell., 20(3): 309–321. http://dx.doi.org/10.1109/34.667887

    Article  Google Scholar 

  38. Senior, A., Heigold, G., Bacchiani, M., et al., 2014. GMMfree DNN training. Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, p.1–5.

    Google Scholar 

  39. Sharma, S., Ellis, D., Kajarekar, S., et al., 2000. Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.1117–1120. http://dx.doi.org/10.1109/ICASSP.2000.859160

    Google Scholar 

  40. Shaw, B., Bhattacharya, U., Parui, S.K., 2014. Combination of features for efficient recognition of offline handwritten Devanagari words. Proc. 14th Int. Conf. on Frontiers in Handwriting Recognition, p.240–245. http://dx.doi.org/10.1109/ICFHR.2014.48

    Google Scholar 

  41. Thomas, S., Chatelain, C., Heutte, L., et al., 2015. A deep HMM model for multiple keywords spotting in handwritten documents. Patt. Anal. Appl., 18(4): 1003–1015. http://dx.doi.org/10.1007/s10044-014-0433-3

    MathSciNet  Article  Google Scholar 

  42. Vinciarelli, A., 2002. A survey on off-line cursive word recognition. Patt. Recog., 35(7): 1433–1446. http://dx.doi.org/10.1016/S0031-3203(01)00129-7

    Article  MATH  Google Scholar 

  43. Vinciarelli, A., Bengio, S., Bunke, H., 2004. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Patt. Anal. Mach. Intell., 26(6): 709–720. http://dx.doi.org/10.1109/TPAMI.2004.14

    Article  Google Scholar 

  44. Young, S., Evermann, G., Gales, M.J.F., 2006. The HTK Book (Version 3.4). Engineering Department, Cambridge University, UK.

    Google Scholar 

  45. Zimmermann, M., Chappelier, J.C., Bunke, H., 2006. Offline grammar-based recognition of handwritten sentences. IEEE Trans. Patt. Anal. Mach. Intell., 28(5): 818–821. http://dx.doi.org/10.1109/TPAMI.2006.103

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Partha Pratim Roy.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61403353)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roy, P.P., Zhong, G. & Cheriet, M. Tandem hidden Markov models using deep belief networks for offline handwriting recognition. Frontiers Inf Technol Electronic Eng 18, 978–988 (2017). https://doi.org/10.1631/FITEE.1600996

Download citation

Key words

  • Handwriting recognition
  • Hidden Markov models
  • Deep learning
  • Deep belief networks
  • Tandem approach

CLC number

  • TP391