Skip to main content

On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Included in the following conference series:

Abstract

The amount of digitized legacy documents has been rising over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed to provide historians and other researchers new ways of indexing, consulting and querying them. However, the performance accuracy of state-of-the-art Handwritten Text Recognition techniques decreases dramatically when they are applied to these historical documents. This is mainly due to the typical paper degradation problems. Therefore, robust pre-processing techniques is an important step for helping further recognition steps. This paper proposes to take existing binarization techniques, in order to retain their advantages, and modify them in such a way that some of the original grayscale information is preserved and be considered by the subsequent recognizer. Results are reported with the publicly available ESPOSALLES database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The corpus is publicly available at: http://www.cvc.uab.es/5cofm/groundtruth.

References

  1. Drida, F.: Towards restoring historic documents degraded over time. In: Proceedings of 2nd IEEE International Conference on Document Image Analysis for Libraries (DIAL 2006), Lyon, France, pp. 350–357 (2006)

    Google Scholar 

  2. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

    Article  Google Scholar 

  3. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  4. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of niblack inspired binarization methods for ancient documents. In: Berkner, K., Likforman-Sulem, L. (eds.) 16th Document Recognition and Retrieval Conference, DRR 2009, SPIE Proceedings, vol. 7247, pp. 1–10. SPIE, San Jose (18–22 January 2009). doi:10.1117/12.805827

  5. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling, Detroit, USA, vol. 1, pp. 181–184 (1995)

    Google Scholar 

  6. Marti, U., Bunke, H.: Using a statistical language model to improve the preformance of an HMM-based cursive handwriting recognition system. IJPRAI 15(1), 65–90 (2001)

    Google Scholar 

  7. Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice-Hall, Englewood Cliffs (1986)

    Google Scholar 

  8. Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46, 1658–1669 (2013). doi:10.1016/j.patcog.2012.11.024

  9. España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martínez, F.: Improving offline handwriting text recognition with hybrid hmm/ann models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)

    Article  Google Scholar 

  10. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recog. 33(2), 225–236 (2000). doi:10.1016/S0031-3203(99)00055-2

    Article  Google Scholar 

  11. Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Proceedings of the SPIE 6815, Document Recognition and Retrieval XV, 681510, pp. 1–6, January 2008. doi:10.1117/12.767755

  12. Toselli, A.H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recog. Artif. Intell. 18(4), 519–539 (2004). doi:10.1142/S0218001404003344

Download references

Acknowledgments

The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 600707 - tranScriptorium and the Spanish MEC under the STraDA project (TIN2012-37475-C02-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauricio Villegas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Villegas, M., Romero, V., Sánchez, J.A. (2015). On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19390-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19389-2

  • Online ISBN: 978-3-319-19390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics