Skip to main content
Log in

Handwritten text separation from annotated machine printed documents using Markov Random Fields

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

The convenience of search, both on the personal computer hard disk as well as on the web, is still limited mainly to machine printed text documents and images because of the poor accuracy of handwriting recognizers. The focus of research in this paper is the segmentation of handwritten text and machine printed text from annotated documents sometimes referred to as the task of “ink separation” to advance the state-of-art in realizing search of hand-annotated documents. We propose a method which contains two main steps—patch level separation and pixel level separation. In the patch level separation step, the entire document is modeled as a Markov Random Field (MRF). Three different classes (machine printed text, handwritten text and overlapped text) are initially identified using G-means based classification followed by a MRF based relabeling procedure. A MRF based classification approach is then used to separate overlapped text into machine printed text and handwritten text using pixel level features forming the second step of the method. Experimental results on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment show that our method is robust and provides good text separation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. FANN. http://leenissen.dk/fann/ (2007)

  2. Alpert, S., Galun, M., Basri, R., Brandt A.: Image segmentation by probabilistic bottom-up aggregation and cue integration. IEEE Conf. Comput. Vis. Pattern Recognit. 1–8 (2007)

  3. Banerjee, J., Namboodiri, A., Jawahar, C.: Contextual restoration of severely degraded document images. IEEE Conf. Comput. Vis. Pattern Recognit. 517–524 (2009)

  4. Belongie S., Malik J., Puzicha J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 509–522 (2002)

    Article  Google Scholar 

  5. Burns, T.J., Corso, J.J.: Robust unsupervised segmentation of degraded document images with topic models. IEEE Conf. Comput. Vis. Pattern Recognit. 1287–1294 (2009)

  6. Cao, H., Govindaraju, V.: Handwritten carbon form preprocessing based on Markov Random Field. IEEE Conf. Comput. Vis. Pattern Recognit. 1–7 (2007)

  7. Cao H., Govindaraju V.: Preprocessing of low-quality handwritten documents using Markov Random Fields. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1184–1194 (2009)

    Article  Google Scholar 

  8. Corso, J.: Discriminative modeling by boosting on multilevel aggregates. IEEE Conf. Comput. Vis. Pattern Recognit. 1–8 (2008)

  9. Eduardo, J.B.D.S., Dubuisson, B., Bortolozzi, F.: Characterizing and distinguishing text in bank cheque images. In: Proceedings of the XV Brazilian Symposium on Computer Graphics and Image Processing, pp. 203–209 (2002)

  10. Eran B., Shimon U.: Combined top-down/bottom-up segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2109–2125 (2008)

    Article  Google Scholar 

  11. Fan K.-C., Wang L.-S., Tu Y.-T.: Classification of machine printed and handwritten texts using character block layout variance. Pattern Recognit. 31, 1275–1284 (1998)

    Article  Google Scholar 

  12. Farooq F., Sridharan K., Govindaraju V.: Identifying handwritten text in mixed documents. IEEE Int. Conf. Pattern Recognit. 2, 1142–1145 (2006)

    Google Scholar 

  13. Freeman W., Carmichael O., Pasztor E.: Learning low-level vision. Int. J. Comput. Vis. 40, 25–47 (2000)

    Article  MATH  Google Scholar 

  14. Freeman W., Jones T., Pasztor E.: Example-based super-resolution. IEEE Trans. Comput. Graph. Appl. 22, 56–65 (2002)

    Article  Google Scholar 

  15. Gorman L.O.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1162–1173 (1993)

    Article  Google Scholar 

  16. Grigorescu S., Petkov N., Kruizinga P.: Comparison of texture features based on Gabor filters. IEEE Trans. Image Process. 11, 1160–1167 (2002)

    Article  MathSciNet  Google Scholar 

  17. Guo, J., Ma, M.: Separating handwritten material from machine printed text using hidden Markov models. IEEE Int. Conf. Document Anal. Recognit. 439–443 (2001)

  18. Gupta, M., Rajaram, S., Petrovic, N., Huang, T.: Models for patch based image restoration. IEEE Workshop Comput. Vis. Pattern Recognit. 17–22 (2006)

  19. Hamerly, G., Elkan, C.: Learning the k in k-means. Adv. Neural Inf. Process. Syst. (2003)

  20. Jang S., Jeong S., Nam Y.: Classification of machine-printed and handwritten addresses on Korean mail piece images using geometric features. IEEE Int. Conf. Pattern Recognit. 2, 383–386 (2004)

    Google Scholar 

  21. Lelore, T., Bouchara, F.: Document image binarisation using markov field model. IEEE Int. Conf. Document Anal. Recognit. 551–555 (2009)

  22. Lettner, M., Sablatnig, R.: Spatial and spectral based segmentation of text in multispectral images of ancient documents. IEEE Int. Conf. Document Anal. Recognit. 813–817 (2009)

  23. Li, S.Z.: A markov Random Field model for object matching under contextual constraints. IEEE Conf. Comput. Vis. Pattern Recognit. 866–869 (1994)

  24. Li, S.Z.: Markov Random Field models in computer vision. Third European Conf. Comput. Vis. 2, 361–370 (1994)

  25. Li S.Z.: Markov Random Field Modeling in Image Analysis. Springer, London (2009)

    MATH  Google Scholar 

  26. Likforman-Sulem L., Vaillant P., de Bodard de la Jacopiere A.: Automatic name extraction from degraded document images. Pattern Anal. Appl. 9, 211–227 (2006)

    Article  MathSciNet  Google Scholar 

  27. Nagy, G., Seth, S., Stoddard, S.: Document analysis with an expert system. Pattern Recognit. Pract. II. 2, 149–155 (1984)

    Google Scholar 

  28. Pearl J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  29. Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Overlapped text segmentation using Markov Random Field and aggregation. In: 9th IAPR International Workshop on Document Analysis Systems (DAS), pp. 129–134 (2010)

  30. Peng, X., Setlur, S., Govindaraju, V., Sitaram, R., Bhuvanagiri, K.: Markov Random Field based text identification from annotated machine printed documents. IEEE Int. Conf. Document Anal. Recognit. 431–435 (2009)

  31. Petkov N.: Biologically motivated computationally intensive approaches to image pattern recognition. Futur. Gener. Comput. Syst. 11, 451–465 (1995)

    Article  Google Scholar 

  32. Petkov N., Kruizinga P.: Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: bar and grating cells. Biol. Cybern. 76, 83–96 (1997)

    Article  MATH  Google Scholar 

  33. Schultz R.R., Stevenson R.L.: Extraction of high-resolution frames from video sequences. IEEE Trans. Image Process. 5, 996–1011 (1996)

    Article  Google Scholar 

  34. Sheikh Y., Shah M.: Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1778–1792 (2005)

    Article  Google Scholar 

  35. Shetty, S., Srinivasan, H., Beal, M., Srihari, S.: Segmentation and labeling of documents using conditional random fields. SPIE Int. Conf. Document Recognit. Retr. IV. 6500U-1-11 (2007)

  36. Srihari, S.N., Govindaraju, V., Shekhawat, A.: Interpretation of handwritten addresses in U. S. mailstream. IEEE Int. Conf. Document Anal. Recognit. 291–294 (1993)

  37. Xiong, L., Wang, F., Zhang, C.: Multilevel belief propagation for fast inference on Markov Random Fields. IEEE Int. Conf. Data Min. 371–380 (2007)

  38. Zhao, L., Davis, L.S.: Iterative figure-ground discrimination. In: Proceedings of the 17th International, IEEE International Conference Pattern Recognition, vol. 1, pp. 67–70, Aug 2004

  39. Zheng Y., Li H., Doermann D.: Machine printed text and handwriting identification in noisy document images. IEEE Trans. Pattern Anal. Mach. Intell. 26, 337–353 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xujun Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, X., Setlur, S., Govindaraju, V. et al. Handwritten text separation from annotated machine printed documents using Markov Random Fields. IJDAR 16, 1–16 (2013). https://doi.org/10.1007/s10032-011-0179-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-011-0179-z

Keywords

Navigation