Abstract
The amount of digitized legacy documents has been rising over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed to provide historians and other researchers new ways of indexing, consulting and querying them. However, the performance accuracy of state-of-the-art Handwritten Text Recognition techniques decreases dramatically when they are applied to these historical documents. This is mainly due to the typical paper degradation problems. Therefore, robust pre-processing techniques is an important step for helping further recognition steps. This paper proposes to take existing binarization techniques, in order to retain their advantages, and modify them in such a way that some of the original grayscale information is preserved and be considered by the subsequent recognizer. Results are reported with the publicly available ESPOSALLES database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The corpus is publicly available at: http://www.cvc.uab.es/5cofm/groundtruth.
References
Drida, F.: Towards restoring historic documents degraded over time. In: Proceedings of 2nd IEEE International Conference on Document Image Analysis for Libraries (DIAL 2006), Lyon, France, pp. 350–357 (2006)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of niblack inspired binarization methods for ancient documents. In: Berkner, K., Likforman-Sulem, L. (eds.) 16th Document Recognition and Retrieval Conference, DRR 2009, SPIE Proceedings, vol. 7247, pp. 1–10. SPIE, San Jose (18–22 January 2009). doi:10.1117/12.805827
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling, Detroit, USA, vol. 1, pp. 181–184 (1995)
Marti, U., Bunke, H.: Using a statistical language model to improve the preformance of an HMM-based cursive handwriting recognition system. IJPRAI 15(1), 65–90 (2001)
Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice-Hall, Englewood Cliffs (1986)
Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46, 1658–1669 (2013). doi:10.1016/j.patcog.2012.11.024
España-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martínez, F.: Improving offline handwriting text recognition with hybrid hmm/ann models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recog. 33(2), 225–236 (2000). doi:10.1016/S0031-3203(99)00055-2
Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Proceedings of the SPIE 6815, Document Recognition and Retrieval XV, 681510, pp. 1–6, January 2008. doi:10.1117/12.767755
Toselli, A.H., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recog. Artif. Intell. 18(4), 519–539 (2004). doi:10.1142/S0218001404003344
Acknowledgments
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 600707 - tranScriptorium and the Spanish MEC under the STraDA project (TIN2012-37475-C02-01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Villegas, M., Romero, V., Sánchez, J.A. (2015). On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-19390-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)