Skip to main content

Detection and Classification of Interesting Parts in Scanned Documents by Means of AdaBoost Classification and Low-Level Features Verification

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9257))

Included in the following conference series:

Abstract

This paper presents a novel approach to detection and identification of selected document’s parts (stamps, logos, printed text blocks, signatures and tables) on digital images obtained through paper document scanning. This task is realized in two main steps. The first one includes element detection, which is done by means of AdaBoost cascade of weak classifiers. Resulting image blocks are, in the second step, subjected to verification process. Eight feature vectors based on recently proposed descriptors were selected and combined with six different classifiers that represent numerous approaches to the task of data classification. Experiments performed on large set of paper document images gathered from Internet gave encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, Y., Phillips, T.I., Haralick, M.R.: Document zone content classification and its performance evaluation. Pattern Recognition 39(1), 57–73 (2006)

    Article  Google Scholar 

  2. Lech, P., Okarma, K.: Fast histogram based image binarization using the monte carlo threshold estimation. In: Chmielewski, L.J., Kozera, R., Shin, B.-S., Wojciechowski, K. (eds.) ICCVG 2014. LNCS, vol. 8671, pp. 382–390. Springer, Heidelberg (2014)

    Google Scholar 

  3. Keysers, D., Shafait, F., Breuel, M.T.: Document image zone classification - a simple high-performance approach. In: 2nd Int. Conf. on Computer Vision Theory and Applications, pp. 44–51 (2007)

    Google Scholar 

  4. Marchewka, A., Pasela, R.: Extraction of Data from Limnigraf Chart Images. In: S. Choras, R. (ed.) Image Processing and Communications Challenges 5. AISC, vol. 233, pp. 263–269. Springer, Heidelberg (2014)

    Google Scholar 

  5. Forczmański, P., Markiewicz, A.: Stamps Detection and Classification Using Simple Features Ensemble. Mathematical Problems in Engineering. Article ID 367879 (2014) (in press)

    Google Scholar 

  6. Okun, O., Doermann, D., Pietikäinen, M.: Page Segmentation and Zone Classification: The State of the Art. Technical Report: LAMP-TR-036/CAR-TR-927/CS-TR-4079, University of Maryland, College Park (1999)

    Google Scholar 

  7. Sauvola, J., Pietikäinen, M.: Page Segmentation and classification using fast feature extraction and connectivity analysis. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, ICDAR 1995, pp. 1127–1131 (1995)

    Google Scholar 

  8. Lin, M.-W., Tapamo, J.-R., Ndovie, B.: A texture-based method for document segmentation and classification. South African Computer Journal 36, 49–56 (2006)

    Google Scholar 

  9. Frejlichowski, D., Forczmański, P.: General shape analysis applied to stamps retrieval from scanned documents. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 251–260. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Forczmański, P., Frejlichowski, D.: Robust stamps detection and classification by means of general shape analysis. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2010, Part I. LNCS, vol. 6374, pp. 360–367. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Forczmański, P., Markiewicz, A.: Low-level image features for stamps detection and classification. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 383–392. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Su, C., Haralick, M.R., Ihsin, T.P.: Extraction of text lines and text blocks on document images based on statistical modeling. International Journal of Imaging Systems and Technology 7(4), 343–356 (1996)

    Article  Google Scholar 

  13. Pietikäinen, M., Okun, O.: Edge-based method for text detection from complex document images. In: Proceedings. Sixth International Conference on Document Analysis and Recognition, pp. 286–291 (2001)

    Google Scholar 

  14. Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognition 29(5), 743–770 (1996)

    Article  Google Scholar 

  15. Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Pattern Recognition Letters 30(2), 114–122 (2009)

    Article  Google Scholar 

  16. Liu, Q., Jung, C., Kim, S., Moon, Y., Kim, J.: Stroke filter for text localization in video images. In: IEEE Internat. Conf. on Image Processing, pp. 1473–1476 (2006)

    Google Scholar 

  17. Li, X., Wang, W., Jiang, S., Huang, Q., Gao, W.: Fast and effective text detection. In: 15th IEEE International Conference on Image Processing, pp. 969–972 (2008)

    Google Scholar 

  18. Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE TPAMI 22(4), 385–392 (2000)

    Article  Google Scholar 

  19. Ojala, T., Pietikäinen, M., Mäenpää, T.: Gray scale and rotation invariant texture classification with local binary patterns. In: Proc. of the 6th European Conference on Computer Vision, pp. 404–420 (2000)

    Google Scholar 

  20. Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: Evaluating the performance of table processing algorithms. International Journal on Document Analysis and Recognition 4(3), 140–153 (2002)

    Article  Google Scholar 

  21. Zhu, G., Zheng, Y., Doermann, D., Jaeger, S.: Signature Detection and Matching for Document Image Retrieval. IEEE TPAMI 31(11), 2015–2031 (2009)

    Article  Google Scholar 

  22. Ahmed, S., Malik, M.I., Liwicki, M., Dengel, A.: Signature segmentation from document images. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 425–429 (2012)

    Google Scholar 

  23. Cüceloğlu, İ., Oğul, H.: Detecting handwritten signatures in scanned documents. In: Proceedings of the 19th Computer Vision Winter Workshop, pp. 89–94 (2014)

    Google Scholar 

  24. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2001, pp. 511–518 (2001)

    Google Scholar 

  25. Burduk, R.: The AdaBoost algorithm with the imprecision determine the weights of the observations. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014, Part II. LNCS, vol. 8398, pp. 110–116. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  26. Liwicki, M.: ICDAR 2009 Signature Verification Competition (2009). http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2009_Signature _Verification _Competition_(SigComp2009) (accessed: February 24, 2015)

  27. Galloway, M.M.: Texture analysis using gray level run lengths. Computer Graphics and Image Processing 4(2), 172–179 (1975)

    Article  Google Scholar 

  28. Tang, X.: Texture information in run-length matrices. IEEE Trans. on Image Processing 7(11), 1602–1609 (1998)

    Article  Google Scholar 

  29. Dasarathy, R.B., Holder, B.E.: Image characterizations based on joint gray-level run-length distributions. Pattern Recognition Letters 12, 497–502 (1991)

    Article  Google Scholar 

  30. Haralick, M.R., Shanmugam, K., Dinstein, I.: Textural Features of Image Classification. IEEE Trans. on Systems, Man and Cybernetics SMC–3(6), 610–621 (1973)

    Article  Google Scholar 

  31. L, S., Tsatsoulis, C.: Texture Analysis of SAR Sea Ice Imagery Using Gray Level Co-Occurrence Matrices. IEEE Trans. on Geoscience and Remote Sensing 37(2), 780–795 (1999)

    Article  Google Scholar 

  32. Clausi, A.D.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sensing 28(1), 45–62 (2002)

    Article  Google Scholar 

  33. Siew, L.H., Hodgson, R.M., Wood, E.J.: Texture measures for carpet wear assessment. IEEE TPAMI 10(1), 92–105 (1988)

    Article  Google Scholar 

  34. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Internat. Conf. on Computer Vision & Pattern Recognition, vol. 2, pp. 886–893 (2005)

    Google Scholar 

  35. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE TPAMI 24(7), 971–987 (2002)

    Article  Google Scholar 

  36. Maturana, D., Mery, D., Soto, Á.: Face recognition with local binary patterns, spatial pyramid histograms and naive bayes nearest neighbor classification. In: Proceedings of the 2009 International Conference of the Chilean Computer Science Society, pp. 125–132 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Forczmański .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Markiewicz, A., Forczmański, P. (2015). Detection and Classification of Interesting Parts in Scanned Documents by Means of AdaBoost Classification and Low-Level Features Verification. In: Azzopardi, G., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2015. Lecture Notes in Computer Science(), vol 9257. Springer, Cham. https://doi.org/10.1007/978-3-319-23117-4_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23117-4_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23116-7

  • Online ISBN: 978-3-319-23117-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics