Abstract
Characters in the historical seals is valuable for the research of corresponding documents. Character extraction for these seals is challenging because of the special characteristics of the seals such as the different carving type (rilievi or diaglyph) and variant border form. Thus most existing character extraction methods do not work well with the images of the historical seals. In this paper, a new character extraction method is proposed based on novel features of the scanned seal images. First, the border width feature and structure feature of each scanned seal image are extracted by analysis of the probabilistic density distribution of border width and brightness of each column. Meanwhile, the optimal border width for rilievi or a starting position of border for diaglyph are generated for subsequent border removal and character extraction. The tri-border feature is used to differentiate between diaglyph and tri-border diaglyph. Then a decision tree is built to classify the seal images into rilievi and diaglyph. The classification result is used to divide background, border and character. After the border is removed with the optimal border width estimated from border width feature, the characters are finally extracted from the seal image. Experimental results on the real data set show that the proposed method classifies the scanned seal image accurately and extracts the characters effectively.
Similar content being viewed by others
References
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7, 197–387.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Processing of international conference on neural information processing systems (pp. 1097–1105).
Zhang, J., & Kasturi, R. (2010). Character energy and link energy-based text extraction in scene images. In Asian conference on computer vision (pp. 308–320).
Zhang, H., Zhao, K., Song, Y. Z., et al. (2013). Text extraction from natural scene image: A survey. Neurocomputing, 122, 310–323.
Lee, S. H., Min, S. C., Jung, K., et al. (2010). Scene text extraction with edge constraint and text collinearity. In International conference on pattern recognition (pp. 3983–3986).
Raj, H., & Ghosh, R. (2014). Devanagari text extraction from natural scene images. In International conference on advances in computing (pp. 513–517).
Segu, R., & Suresh, K. (2014). Joint feature extraction technique for text detection from natural scene image. International Journal of Signal and Imaging Systems Engineering, 10, 14.
Khodadadi, M., & Behrad, A. (2012). Text localization, extraction and inpainting in color images. In Electrical engineering (pp. 1035–1040).
Peng, X., Cao, H., Prasad, R., et al. (2011). Text extraction from video using conditional random fields. In International conference on document analysis and recognition (pp. 1029–1033).
Li, Z., Liu, G., Qian, X., et al. (2011). Effective and efficient video text extraction using key text points. IET Image Processing, 5, 671–683.
Huang, X., Wang, Q., Zhu, L., et al. (2014). A new video text extraction method based on stroke. In International congress on image and signal processing (pp. 75–80).
Wong, E. K., & Chen, M. (2015). A new robust algorithm for video text extraction. Pattern Recognition, 36, 1397–1406.
Vellingiriraj, E. K., Balamurugan, M., & Balasubramanie, P. (2017). Information extraction and text mining of Ancient Vattezhuthu characters in historical documents using image zoning. In International conference on Asian Language Processing (pp. 37–40).
Karatzas, D., & Antonacopoulos, A. (2013). Text extraction from web images based on a split-and-merge segmentation method using colour perception. In International conference on pattern recognition (vol. 2, pp. 634–637).
Shi, C., Wang, C., Xiao, B., et al. (2013). Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, 34, 107–116.
Koo, H. I., & Kim, D. H. (2013). Scene text detection via connected component clustering and nontext filtering. IEEE Transactions on Image Processing, 22, 2296–2305.
Zhao, M., Li, S., & Kwok, J. (2010). Text detection in images using sparse representation with discriminative dictionaries. Image and Vision Computing, 28, 1590–1599.
Shivakumara, P., Phan, T. Q., & Tan, C. L. (2010). A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 412–419.
Anthimopoulos, M., Gatos, B., & Pratikakis, I. (2010). A two-stage scheme for text detection in video images. Image and Vision Computing, 28, 1413–1426.
Angadi, S. A., & Kodabagi, M. M. (2010). Text region extraction from low resolution natural scene images using texture features. In Advance computing conference (pp. 121–128).
Baba, Y., & Akira, H. (2004). Proposal of the hybrid spectral gradient method to extract character/text regions from general scene images. In International conference on image processing (pp. 211–214).
Khayyat, M., Lam, L., Suen, C. Y., et al. (2012). Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In IAPR international workshop on document analysis systems (pp. 100–104).
Liu, X., & Samarabandu, J. (2006). Multiscale edge-based text extraction from complex images. In IEEE international conference on multimedia and expo (pp. 1721–1724).
Wang, C., Guo, Z., & Chen, Y. (2015). Seal extraction based on local thresholding techniques and color analysis. In International conference on service sciences (pp. 79–84).
Ren, C., Liu, D., & Chen, Y. (2011). A new method on the segmentation and recognition of Chinese characters for automatic Chinese seal imprint retrieval. In International conference on document analysis and recognition (pp. 972–976).
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Funding
Funding was provided by Natural Science Foundation of Hunan Province (No. 2018JJ3071).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dai, T., Sun, B. Novel Features for Character Extraction of Historical Chinese Seal Images. Sens Imaging 20, 32 (2019). https://doi.org/10.1007/s11220-019-0253-z
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s11220-019-0253-z