Skip to main content
Log in

Text extraction method for historical Tibetan document images based on block projections

  • Published:
Optoelectronics Letters Aims and scope Submit manuscript

Abstract

Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks’ projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Eskenazi, Sébastien, P. Gomez-Krämer and J. M. Ogier, Pattern Recognition 64, 1 (2017).

    Article  Google Scholar 

  2. Dai-Ton Ha, N. Duc-Dung and D. H. Le, Pattern Recognition Letters 80, 137 (2016).

    Article  Google Scholar 

  3. Ming Yu, Computer Engineering and Applications 49, 195 (2013). (in Chinese)

    Google Scholar 

  4. Rong Xiao, South-Center University for Nationalities, 2011. (in Chinese)

    Google Scholar 

  5. Bukhari and Syed Saqib, 2012 International Conference on Frontiers in Handwriting Recognition, 639 (2012).

    Google Scholar 

  6. Kavitha A. S., Egyptian Informatics Journal 17, 189 (2016).

    Article  Google Scholar 

  7. Zeng Fanfeng, Journal of Software 8, 1827 (2013).

    Google Scholar 

  8. Yadav, Vikas and N. Ragot, 2016 12th Iapr International Workshop on Document Analysis Systems, 281 (2016).

    Google Scholar 

  9. Ramel J. Y., International Journal on Document Analysis and Recognition 9, 243 (2007).

    Article  Google Scholar 

  10. Winder, Amy, T. Andersen and E. H. B. Smith, 2011 International Conference on Document Analysis and Recognition IEEE Computer Society, 1245 (2011).

    Google Scholar 

  11. Singh V and B. Kumar, 2014 International Conference on Computer Communication and Informatics IEEE, 1 (2014).

    Google Scholar 

  12. Chen Kai, 2015 13th International Conference on Document Analysis and Recognition, 1101 (2015).

    Google Scholar 

  13. Chen Kai, 2016 12th Iapr Workshop on Document Analysis Systems IEEE Computer Society, 299 (2016).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-juan Duan  (段立娟).

Additional information

This work has been supported by the Innovation Platform Construction of Qinghai Province (No.2016-ZJ-Y04), and the Basic Research Program of Qinghai Province (No.2016-ZJ-740). This paper was presented in part at the CCF Chinese Conference on Computer Vision, Tianjin, 2017. This paper was recommended by the program committee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, Lj., Zhang, Xq., Ma, Ll. et al. Text extraction method for historical Tibetan document images based on block projections. Optoelectron. Lett. 13, 457–461 (2017). https://doi.org/10.1007/s11801-017-7197-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11801-017-7197-0

Document code

Navigation