Skip to main content

Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

  • Conference paper
  • First Online:
Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016 (IntelliSys 2016)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 16))

Included in the following conference series:

  • 2718 Accesses

Abstract

In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gary, M.T.M., Poon, J.C.H.: A fuzzy-attributed graph approach to handwritten character recognition. In: Proceedings of the IEEE International Conference on Fuzzy System, pp. 570–575 (1993)

    Google Scholar 

  2. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 2609–2612 (2011)

    Google Scholar 

  3. Moghaddam, R.F., Cheriet, M.: A variational approach to degraded document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1347–1361 (2010)

    Article  Google Scholar 

  4. Varghahan, B.Z., Amirani, M.C., Mihandoost, S.: Enhancement and cleaning of handwritten data by using neural networks and threshold technical. In: Proceedings of the IEEE International Conference on Application of Information and Communication Technologies, pp. 1–4 (2011)

    Google Scholar 

  5. Otsu, N.: A threshold selection method from grey-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  6. Moghaddam, R.F., Cheriet, M.: AdOtsu: an adaptive and parameterless generalization of Otsu’s method for document image binarization. Pattern Recogn. 45(6), 2419–2431 (2012)

    Article  Google Scholar 

  7. Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Englewood Cliffs (1986)

    Google Scholar 

  8. Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)

    Article  Google Scholar 

  9. Gatos, B., Ntrirogiannis, K., Pratikasis, I.: Dibco 2009: document image binarization contest. Int. J. Doc. Anal. Recogn., 1–10 (2010)

    Google Scholar 

  10. Su, B., Lu, S., Tan, C.L.: A self-training learning document binarization framework. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp. 3187-3190 (2010)

    Google Scholar 

  11. Ntirogiannis, K., Gatos, B., Pratikakis, I.: A combined approach for the binarization of handwritten document images. Pattern Recogn. Lett. 35, 3–15 (2014)

    Article  Google Scholar 

  12. Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013)

    Article  MathSciNet  Google Scholar 

  13. Chen, Y., Leedham, G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Sig. Process. 152(6), 702–714 (2005)

    Article  Google Scholar 

  14. Don, H.S.: A noise attribute thresholding method for document image binarization. Int. J. Doc. Anal. Recogn. 4(2), 131–138 (2001)

    Article  Google Scholar 

  15. Feng, M.L., Tan, Y.P.: Contrast adaotive binarization of low quality document images. IEICE Electron. Express 1(16), 501–506 (2004)

    Article  Google Scholar 

  16. Gatos, B., Pratikasis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recogn. 39(3), 317–327 (2006)

    Article  MATH  Google Scholar 

  17. Arici, T., Dikbas, S., Altunbasak, Y.: A histogram modification framework and its application for image contrast enhancement. IEEE Trans. Image Process. 18(9), 1921–1935 (2009)

    Article  MathSciNet  Google Scholar 

  18. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  MATH  Google Scholar 

Download references

Acknowledgement

This work is part of the research project “Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huseyin Kusetogullari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Kusetogullari, H. (2018). Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering. In: Bi, Y., Kapoor, S., Bhatia, R. (eds) Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-56991-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56991-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56990-1

  • Online ISBN: 978-3-319-56991-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics