On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition

  • Mauricio Villegas
  • Verónica Romero
  • Joan Andreu Sánchez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9117)

Abstract

The amount of digitized legacy documents has been rising over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed to provide historians and other researchers new ways of indexing, consulting and querying them. However, the performance accuracy of state-of-the-art Handwritten Text Recognition techniques decreases dramatically when they are applied to these historical documents. This is mainly due to the typical paper degradation problems. Therefore, robust pre-processing techniques is an important step for helping further recognition steps. This paper proposes to take existing binarization techniques, in order to retain their advantages, and modify them in such a way that some of the original grayscale information is preserved and be considered by the subsequent recognizer. Results are reported with the publicly available ESPOSALLES database.

References

  1. 1.
    Drida, F.: Towards restoring historic documents degraded over time. In: Proceedings of 2nd IEEE International Conference on Document Image Analysis for Libraries (DIAL 2006), Lyon, France, pp. 350–357 (2006)Google Scholar
  2. 2.
    Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRefGoogle Scholar
  3. 3.
    Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Google Scholar
  4. 4.
    Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of niblack inspired binarization methods for ancient documents. In: Berkner, K., Likforman-Sulem, L. (eds.) 16th Document Recognition and Retrieval Conference, DRR 2009, SPIE Proceedings, vol. 7247, pp. 1–10. SPIE, San Jose (18–22 January 2009). doi:10.1117/12.805827
  5. 5.
    Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling, Detroit, USA, vol. 1, pp. 181–184 (1995)Google Scholar
  6. 6.
    Marti, U., Bunke, H.: Using a statistical language model to improve the preformance of an HMM-based cursive handwriting recognition system. IJPRAI 15(1), 65–90 (2001)Google Scholar
  7. 7.
    Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice-Hall, Englewood Cliffs (1986)Google Scholar
  8. 8.
    Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46, 1658–1669 (2013). doi:10.1016/j.patcog.2012.11.024
  9. 9.
    España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martínez, F.: Improving offline handwriting text recognition with hybrid hmm/ann models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)CrossRefGoogle Scholar
  10. 10.
    Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recog. 33(2), 225–236 (2000). doi:10.1016/S0031-3203(99)00055-2 CrossRefGoogle Scholar
  11. 11.
    Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Proceedings of the SPIE 6815, Document Recognition and Retrieval XV, 681510, pp. 1–6, January 2008. doi:10.1117/12.767755
  12. 12.
    Toselli, A.H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recog. Artif. Intell. 18(4), 519–539 (2004). doi:10.1142/S0218001404003344

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mauricio Villegas
    • 1
  • Verónica Romero
    • 1
  • Joan Andreu Sánchez
    • 1
  1. 1.PRHLTUniversitat Politècnica de ValènciaValènciaSpain

Personalised recommendations