Clutter noise removal in binary document images

Original Paper

Abstract

The paper presents a clutter detection and removal algorithm for complex document images. This distance transform based technique aims to remove irregular and independent unwanted clutter while preserving the text content. The novelty of this approach is in its approximation to the clutter–content boundary when the clutter is attached to the content in irregular ways. As an intermediate step, a residual image is created, which forms the basis for clutter detection and removal. Clutter detection and removal are independent of clutter’s position, size, shape, and connectivity with text. The method is tested on a collection of highly degraded and noisy, machine-printed and handwritten Arabic and English documents, and results show pixel-level accuracies of 99.18 and 98.67 % for clutter detection and removal, respectively. This approach is also extended to documents having a mix of clutter and salt-and-pepper noise.

Keywords

Clutter removal Noise border removal Margin removal Image enhancement Pixel-based noise removal 

References

  1. 1.
    Agrawal, M., Doermann, D.: Stroke-like pattern noise removal in binary document images. In: International Conference on Document Analysis and Recognition (ICDAR’11), pp. 17–21 (2011)Google Scholar
  2. 2.
    Ali, M.: Background noise detection and cleaning in document images. Proceedings of 13th International Conference on Pattern Recognition (ICPR’96), vol. 3, pp. 758–762 (1996)Google Scholar
  3. 3.
    Ávila, B.T., Lins, R.D.: A new algorithm for removing noisy borders from monochromatic documents. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC ’04) New York, pp. 1219–1225 (2004)Google Scholar
  4. 4.
    Baird, H.S.: The state of the art of document image degradation modeling. In: Proceedings of 4th IAPR International Workshop on Document Analysis Systems, pp. 1–16 (2000)Google Scholar
  5. 5.
    Bardsley, J., Jefferies, S., Nagy, J., Plemmons, R.: A computational method for the restoration of images with an unknown, spatially-varying blur. Opt. Express 14(5), 1767–1782 (2006)CrossRefGoogle Scholar
  6. 6.
    Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. (CVGIP’86), 34(3), 344–371 (1986)Google Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  8. 8.
    Chinnasarn, K., Rangsanseri, Y., Thitimajshima, P.: Removing salt-and-pepper noise in text/graphics images. Asia-Pacific Conference on Circuits and Systems (IEEE APCCAS), pp. 459–462 (1998)Google Scholar
  9. 9.
    Chowdhury, S.P., Mandal, S., Das, A.K., Chanda, B.: Automated segmentation of math-zones from document images. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03), IEEE Computer Society, Washington, p. 755 (2003)Google Scholar
  10. 10.
    Fan, K.C., Wang, Y.K., Lay, T.R.: Marginal noise removal of document images. In: Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 317–321 (2001)Google Scholar
  11. 11.
    Guyon, I., Haralick, R.M., Hull, J.J., Phillips, I.T.: Data sets for OCR and document image understanding research. In: Proceedings of SPIE—Document Recognition IV, pp. 779–799. World Scientific (1997)Google Scholar
  12. 12.
    Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.: Handwritten arabic text line segmentation using affinity propagation. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS’10), DAS ’10, pp. 135–142 (2010)Google Scholar
  13. 13.
    Le, D.X., Thoma, G.R., Wechsler, H.: Automated borders detection and adaptive segmentation for binary document images. In: Proceedings of the International Conference on Pattern Recognition (ICPR ’96), ICPR ’96, vol. 3, pp. 737. IEEE Computer Society, Washington (1996)Google Scholar
  14. 14.
    Liang, S., Ahmadi, M., Shridhar, M.: A morphological approach to text string extraction from regular periodic overlapping text/background images. In: Proceedings of IEEE International Conference on Image Processing (ICIP’94)1, vol. 1, pp. 144–148 (1994)Google Scholar
  15. 15.
    Liu, Y., Srihari, S.: Document image binarization based on texture features. IEEE Trans. Pattern Analy. Mach. Intell. (PAMI) 19(5), 540–544 (1997)CrossRefGoogle Scholar
  16. 16.
    Negishi, H., Kato, J., Hase, H., Watanabe, T.: Character extraction from noisy background for an automatic reference system. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR’99) pp. 143–146 (1999)Google Scholar
  17. 17.
    Ozawa, H., Nakagawa, T.: A character image enhancement method from characters with various background images. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93), pp. 58–61 (1993)Google Scholar
  18. 18.
    Pham, T.D.: Unconstrained logo detection in document images. Pattern Recognit. 36(12), 3023–3025 (2003)CrossRefMATHGoogle Scholar
  19. 19.
    Rosenfeld, A., Pfaltz, J.L.: Sequential operations in digital picture processing. J. Assoc. Comp. Mach. 13(4), 471–494 (1966)CrossRefMATHGoogle Scholar
  20. 20.
    Rosenfeld, A., Pfaltz, J.L.: Distance functions on digital pictures. Pattern Recognit. 1(1), 33–61 (1968)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Shafait, F., Breuel, T.: A simple and effective approach for border noise removal from document images. In: IEEE 13th International Multitopic Conference (INMIC’09), pp. 1–5 (2009)Google Scholar
  22. 22.
    Stamatopoulos, N., Gatos, B., Georgiou, T.: Automatic borders detection of camera document images. In: Proceedings of 2nd International Workshop Camera-Based Document Analysis and Recognition (CBDAR’07), pp. 71–78 (2007)Google Scholar
  23. 23.
    Stamatopoulos, N., Gatos, B., Georgiou, T.: Page frame detection for double page document images. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS’10), DAS ’10, pp. 401–408. ACM (2010)Google Scholar
  24. 24.
    Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text extraction in complex color documents. Pattern Recognit. 35(8), 1743–1758 (2002)CrossRefMATHGoogle Scholar
  25. 25.
    Strouthopoulos, C., Papamarkos, N., Chamzas, C.: Identification of text-only areas in mixed-type documents. Eng. Appl. Artif. Intell. 10(4), 387–401 (1997)CrossRefGoogle Scholar
  26. 26.
    Wang, Q., Tan, C.L.: Matching of double-sided document images to remove interference. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) I-1084-I-1089 vol. 1 (2001)Google Scholar
  27. 27.
    Windyga, P.S.: Fast impulsive noise removal. IEEE Trans. Image Process. 10, 173–179 (2001)CrossRefGoogle Scholar
  28. 28.
    Wu, V., Manmatha, R., Riseman Sr, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 21(11), 1224–1229 (1999)CrossRefGoogle Scholar
  29. 29.
    Yuan, Q., Tan, C.: Text extraction from gray scale document images using edge information. In: Proceedings of 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 302–306 (2001)Google Scholar
  30. 30.
    Zheng, Y., Li, H., Doermann, D.: A model-based line detection algorithm in documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR’03) vol. 1, pp. 44–48 (2003) Google Scholar
  31. 31.
    Zheng, Y., Liu, C., Ding, X., Pan, S.: Form frame line detection with directional single-connected chain. In: Proceedings of 6th International Conference on Document Analysis and Recognition (ICDAR’01), pp. 699–703 (2001)Google Scholar
  32. 32.
    Zhu, G., Jaeger, S., Doermann, D.: A robust stamp detection framework on degraded documents. In: International Conference on Document Recognition and Retrieval XIII, pp. 1–9. San Jose (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Advanced Computer Studies University of MarylandCollege ParkUSA

Personalised recommendations