Text extraction method for historical Tibetan document images based on block projections

Duan, Li-juan; Zhang, Xi-qun; Ma, Long-long; Wu, Jian

doi:10.1007/s11801-017-7197-0

Text extraction method for historical Tibetan document images based on block projections

Published: 17 November 2017

Volume 13, pages 457–461, (2017)
Cite this article

Optoelectronics Letters Aims and scope Submit manuscript

Li-juan Duan (段立娟)^1,2,
Xi-qun Zhang (张西群)^1,3,
Long-long Ma (马龙龙)⁴ &
…
Jian Wu (吴健)⁴

59 Accesses
4 Citations
Explore all metrics

Abstract

Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks’ projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection

Text-Line Extraction from Historical Kannada Document

Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis

References

Eskenazi, Sébastien, P. Gomez-Krämer and J. M. Ogier, Pattern Recognition 64, 1 (2017).
Article Google Scholar
Dai-Ton Ha, N. Duc-Dung and D. H. Le, Pattern Recognition Letters 80, 137 (2016).
Article Google Scholar
Ming Yu, Computer Engineering and Applications 49, 195 (2013). (in Chinese)
Google Scholar
Rong Xiao, South-Center University for Nationalities, 2011. (in Chinese)
Google Scholar
Bukhari and Syed Saqib, 2012 International Conference on Frontiers in Handwriting Recognition, 639 (2012).
Google Scholar
Kavitha A. S., Egyptian Informatics Journal 17, 189 (2016).
Article Google Scholar
Zeng Fanfeng, Journal of Software 8, 1827 (2013).
Google Scholar
Yadav, Vikas and N. Ragot, 2016 12^th Iapr International Workshop on Document Analysis Systems, 281 (2016).
Google Scholar
Ramel J. Y., International Journal on Document Analysis and Recognition 9, 243 (2007).
Article Google Scholar
Winder, Amy, T. Andersen and E. H. B. Smith, 2011 International Conference on Document Analysis and Recognition IEEE Computer Society, 1245 (2011).
Google Scholar
Singh V and B. Kumar, 2014 International Conference on Computer Communication and Informatics IEEE, 1 (2014).
Google Scholar
Chen Kai, 2015 13^th International Conference on Document Analysis and Recognition, 1101 (2015).
Google Scholar
Chen Kai, 2016 12^th Iapr Workshop on Document Analysis Systems IEEE Computer Society, 299 (2016).
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Li-juan Duan (段立娟) & Xi-qun Zhang (张西群)
Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing University of Technology, Beijing, 100124, China
Li-juan Duan (段立娟)
Beijing Key Laboratory of Trusted Computing, Beijing University of Technology, Beijing, 100124, China
Xi-qun Zhang (张西群)
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Long-long Ma (马龙龙) & Jian Wu (吴健)

Authors

Li-juan Duan (段立娟)
View author publications
You can also search for this author in PubMed Google Scholar
Xi-qun Zhang (张西群)
View author publications
You can also search for this author in PubMed Google Scholar
Long-long Ma (马龙龙)
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wu (吴健)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-juan Duan (段立娟).

Additional information

This work has been supported by the Innovation Platform Construction of Qinghai Province (No.2016-ZJ-Y04), and the Basic Research Program of Qinghai Province (No.2016-ZJ-740). This paper was presented in part at the CCF Chinese Conference on Computer Vision, Tianjin, 2017. This paper was recommended by the program committee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, Lj., Zhang, Xq., Ma, Ll. et al. Text extraction method for historical Tibetan document images based on block projections. Optoelectron. Lett. 13, 457–461 (2017). https://doi.org/10.1007/s11801-017-7197-0

Download citation

Received: 30 August 2017
Revised: 18 September 2017
Published: 17 November 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11801-017-7197-0

Document code

A

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text extraction method for historical Tibetan document images based on block projections

Abstract

Access this article

Similar content being viewed by others

Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection

Text-Line Extraction from Historical Kannada Document

Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Document code

Navigation

Text extraction method for historical Tibetan document images based on block projections

Abstract

Access this article

Similar content being viewed by others

Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection

Text-Line Extraction from Historical Kannada Document

Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Document code

Search

Navigation