Skip to main content
Log in

Document image binarization by two-stage block extraction and background intensity determination

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel approach to binarizing document images. All blocks with individual background intensity values in a document image are first extracted using a two-stage extraction procedure. Then, the intensity distribution of each block is calculated to determine the variation ranges of background intensity. For each extracted block, interior pixels whose intensity values fall within these ranges are regarded as background pixels. For those pixels outside all extracted blocks, Otsu’s global threshold method is applied to binarize them. To evaluate the developed system, 275 representative document images are collected to evaluate the binarization results by recognizing characters extracted from those binarized images. These binarized images are generated using the proposed and other existent approaches and fed into the same optical character recognition system to evaluate the practicability of each method. The proposed document binarization method obtains the highest recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Lee SU, Chung SY (1990) A comparative performance study of several global thresholding techniques for segmentation. Comput Vis Graph 52:171–190

    Article  Google Scholar 

  2. Otsu N (1979) A thresholding selection method from gray-scale histogram. IEEE T Syst Man Cyb 9:62–66

    Article  Google Scholar 

  3. Kittler J, Illingworth J (1985) On threshold selection using clustering criteria. IEEE T Syst Man Cyb 15:652–655

    Google Scholar 

  4. Kapur JN, Sahoo PK, Wong AKC (1985) A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vis Graph 29:273–285

    Article  Google Scholar 

  5. Trier ØD, Taxt T (1995) Evaluation of binarization methods for document images. IEEE Trans Pattern Anal 17(3):312–315

    Article  Google Scholar 

  6. Trier ØD, Jain AK (1995) Goal-directed evaluation of binarization method. IEEE Trans Pattern Anal 17(12):1191–1201

    Article  Google Scholar 

  7. Niblack W (1986) An introduction to digital image processing. Prentice Hall, New Jersey, pp 115–116

    Google Scholar 

  8. Bernsen J (1986) Dynamic thresholding of gray-level images. In: Proc Int Conf on Pat Recog, pp 1251–1255

  9. Tabbone S, Wendling L (2000) Multi-scale binarization of images. Pattern Recognit Lett 24(1–3):927–943

    Google Scholar 

  10. Sauvola J, Pietikainen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236

    Article  Google Scholar 

  11. Dawoud A, Kamel MS (2004) Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans Image Process 13(9):1223–1230

    Article  Google Scholar 

  12. Murtagh F, Starck JL (2003) Quantization from Bayes factors with application to multilevel thresholding. Pattern Recognit Lett 24(12):2001–2007

    Article  Google Scholar 

  13. O’Gorman L (1994) Binarization and multithresholding of document images using connectivity. Graph Model Im Proc 56(6):494–506

    Article  MathSciNet  Google Scholar 

  14. Chang CC, Wang LL (1997) A fast multilevel thresholding method based on lowpass and highpass filtering. Pattern Recognit Lett 18(14):1469–1478

    Article  Google Scholar 

  15. Sezgin M, Tasaltin R (2000) A new dichotomization technique to multilevel thresholding devoted to inspection applications. Pattern Recognit Lett 21(2):151–161

    Article  Google Scholar 

  16. Gonzalez RC, Woods RE (2002) Digital image processing, 2/E. Prentice-Hall, New Jersey, pp 578–579

    Google Scholar 

  17. Tseng YH, Kuo CC, Lee HJ (1998) Speeding up Chinese character recognition in an automatic document reading system. Pattern Recognit 31(11):1601–1612

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Hong Tseng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tseng, YH., Lee, HJ. Document image binarization by two-stage block extraction and background intensity determination. Pattern Anal Applic 11, 33–44 (2008). https://doi.org/10.1007/s10044-007-0077-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0077-7

Keywords

Navigation