Abstract
Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks’ projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Eskenazi, Sébastien, P. Gomez-Krämer and J. M. Ogier, Pattern Recognition 64, 1 (2017).
Dai-Ton Ha, N. Duc-Dung and D. H. Le, Pattern Recognition Letters 80, 137 (2016).
Ming Yu, Computer Engineering and Applications 49, 195 (2013). (in Chinese)
Rong Xiao, South-Center University for Nationalities, 2011. (in Chinese)
Bukhari and Syed Saqib, 2012 International Conference on Frontiers in Handwriting Recognition, 639 (2012).
Kavitha A. S., Egyptian Informatics Journal 17, 189 (2016).
Zeng Fanfeng, Journal of Software 8, 1827 (2013).
Yadav, Vikas and N. Ragot, 2016 12th Iapr International Workshop on Document Analysis Systems, 281 (2016).
Ramel J. Y., International Journal on Document Analysis and Recognition 9, 243 (2007).
Winder, Amy, T. Andersen and E. H. B. Smith, 2011 International Conference on Document Analysis and Recognition IEEE Computer Society, 1245 (2011).
Singh V and B. Kumar, 2014 International Conference on Computer Communication and Informatics IEEE, 1 (2014).
Chen Kai, 2015 13th International Conference on Document Analysis and Recognition, 1101 (2015).
Chen Kai, 2016 12th Iapr Workshop on Document Analysis Systems IEEE Computer Society, 299 (2016).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported by the Innovation Platform Construction of Qinghai Province (No.2016-ZJ-Y04), and the Basic Research Program of Qinghai Province (No.2016-ZJ-740). This paper was presented in part at the CCF Chinese Conference on Computer Vision, Tianjin, 2017. This paper was recommended by the program committee.
Rights and permissions
About this article
Cite this article
Duan, Lj., Zhang, Xq., Ma, Ll. et al. Text extraction method for historical Tibetan document images based on block projections. Optoelectron. Lett. 13, 457–461 (2017). https://doi.org/10.1007/s11801-017-7197-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11801-017-7197-0