Skip to main content

Light-Weight Document Image Cleanup Using Perceptual Loss

  • 2286 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12823)

Abstract

Smartphones have enabled effortless capturing and sharing of documents in digital form. The documents, however, often undergo various types of degradation due to aging, stains, or shortcoming of capturing environment such as shadow, non-uniform lighting, etc., which reduces the comprehensibility of the document images. In this work, we consider the problem of document image cleanup on embedded applications such as smartphone apps, which usually have memory, energy, and latency limitations due to the device and/or for best human user experience. We propose a light-weight encoder decoder based convolutional neural network architecture for removing the noisy elements from document images. To compensate for generalization performance with a low network capacity, we incorporate the perceptual loss for knowledge transfer from pre-trained deep CNN network in our loss function. In terms of the number of parameters and product-sum operations, our models are 65–1030 and 3–27 times, respectively, smaller than existing state-of-the-art document enhancement models. Overall, the proposed models offer a favorable resource versus accuracy trade-off and we empirically illustrate the efficacy of our approach on several real-world benchmark datasets.

Keywords

  • Document cleanup
  • Perceptual loss
  • Document binarization
  • Light-weight model

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-86334-0_16
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-86334-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

References

  1. Afzal, M.Z., Pastor-Pellicer, J., Shafait, F., Breuel, T.M., Dengel, A., Liwicki, M.: Document image binarization using LSTM: a sequence learning approach. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, pp. 79–84, HIP 2015. Association for Computing Machinery, New York (2015)

    Google Scholar 

  2. Bako, S., Darabi, S., Shechtman, E., Wang, J., Sunkavalli, K., Sen, P.: Removing shadows from images of documents. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10113, pp. 173–183. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54187-7_12

    CrossRef  Google Scholar 

  3. Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)

    CrossRef  Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)

    Google Scholar 

  5. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  6. Gangeh, M.J., Tiyyagura, S.R., Dasaratha, S.V., Motahari, H., Duffy, N.P.: Document enhancement system using auto-encoders. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)

    Google Scholar 

  7. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th International Conference on Document Analysis and Recognition, pp. 1375–1382 (2009)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  9. He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019)

    CrossRef  Google Scholar 

  10. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)

    Google Scholar 

  11. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    CrossRef  Google Scholar 

  12. Kang, S., Iwana, B.K., Uchida, S.: Cascading modular U-Nets for document image binarization. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 675–680 (2019)

    Google Scholar 

  13. Kingma, P.D., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  14. Kise, K.: Page segmentation techniques in document analysis. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 135–175. Springer, London (2014)

    CrossRef  Google Scholar 

  15. Kligler, N., Katz, S., Tal, A.: Document enhancement using visibility detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2374–2382 (2018)

    Google Scholar 

  16. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in Neural Information Processing Systems (2011)

    Google Scholar 

  17. Krizhevsky, A.: Convolutional deep belief networks on CIFAR-10 (2010)

    Google Scholar 

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  19. Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recogn. 17(2), 105–123 (2014)

    CrossRef  Google Scholar 

  20. Li, X., Zhang, B., Liao, J., Sander, P.V.: Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. 38(6), 1–11 (2019)

    Google Scholar 

  21. Lin, Y.H., Chen, W.C., Chuang, Y.Y.: BEDSR-Net: a deep shadow removal network from a single document image. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12902–12911 (2020)

    Google Scholar 

  22. Liu, N., et al.: An iterative refinement framework for image document binarization with bhattacharyya similarity measure. In: 14th International Conference on Document Analysis and Recognition, pp. 93–98. ICDAR 2017, IEEE Computer Society (2017)

    Google Scholar 

  23. Moghaddam, R.F., Cheriet, M.: AdOtsu: an adaptive and parameterless generalization of Otsu’s method for document image binarization. Pattern Recogn. 45(6), 2419–2431 (2012)

    CrossRef  Google Scholar 

  24. Mondal, T., Coustaty, M., Gomez-Krämer, P., Ogier, J.: Learning free document image binarization based on fast fuzzy c-means clustering. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1384–1389 (2019)

    Google Scholar 

  25. Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR 2014 competition on handwritten document image binarization (H-DIBCO 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 809–813 (2014)

    Google Scholar 

  26. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    MathSciNet  CrossRef  Google Scholar 

  27. Peng, X., Cao, H., Subramanian, K., Prasad, R., Natarajan, P.: Exploiting stroke orientation for CRF based binarization of historical documents. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1034–1038 (2013)

    Google Scholar 

  28. Peng, X., Wang, C., Cao, H.: Document binarization via multi-resolutional attention model with DRD loss. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 45–50 (2019)

    Google Scholar 

  29. Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010 - handwritten document image binarization competition. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 727–732 (2010)

    Google Scholar 

  30. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (dibco 2011). In: 2011 International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011)

    Google Scholar 

  31. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012). In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 817–822 (2012)

    Google Scholar 

  32. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1471–1476 (2013)

    Google Scholar 

  33. Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018). In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 489–493 (2018)

    Google Scholar 

  34. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICFHR 2016 handwritten document image binarization contest (H-DIBCO 2016). In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 619–623 (2016)

    Google Scholar 

  35. Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR 2017 competition on document image binarization (DIBCO 2017). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1395–1403 (2017)

    Google Scholar 

  36. Rad, M.S., Bozorgtabar, B., Marti, U., Basler, M., Ekenel, H.K., Thiran, J.: SROBB: targeted perceptual loss for single image super-resolution. In: International Conference on Computer Vision (2019)

    Google Scholar 

  37. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (2015)

    Google Scholar 

  38. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    MathSciNet  CrossRef  Google Scholar 

  39. Jung, S., Hasan, M.A., Kim, C.: Water-filling: an efficient algorithm for digitized document shadow removal. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 398–414. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_25

    CrossRef  Google Scholar 

  40. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)

    CrossRef  Google Scholar 

  41. Silva, J.M.M.D., Lins, R.D., Martins, F.M.J., Wachenchauzer, R.: A new and efficient algorithm to binarize document images removing back-to-front interference. J. Univ. Comput. Sci. 14(2), 299–313 (2008)

    Google Scholar 

  42. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)

    Google Scholar 

  43. Souibgui, M.A., Kessentini, Y.: DE-GAN: a conditional generative adversarial network for document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 1–12 (2020). Early access

    Google Scholar 

  44. Tabatabaei, S.A., Bohlool, M.: A novel method for binarization of badly illuminated document images. 17th IEEE International Conference on Image Processing, pp. 3573–3576 (2010)

    Google Scholar 

  45. Teichmann, M., Cipolla, R.: Convolutional CRFs for semantic segmentation. In: British Machine Vision Conference (2019)

    Google Scholar 

  46. Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 14th International Conference on Document Analysis and Recognition, pp. 99–104, ICDAR 2017. IEEE Computer Society (2017)

    Google Scholar 

  47. Valizadeh, M., Kabir, E.: An adaptive water flow model for binarization of degraded document images. Int. J. Doc. Anal. Recogn. 16, 1–12 (2013)

    Google Scholar 

  48. Vo, G.D., Park, C.: Robust regression for image binarization under heavy noise and nonuniform background. Pattern Recogn. 81, 224–239 (2018)

    CrossRef  Google Scholar 

  49. Wang, B., Chen, C.L.P.: An effective background estimation method for shadows removal of document images. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3611–3615 (2019)

    Google Scholar 

  50. Wang, J., Chuang, Y.: Shadow removal of text document images by estimating local and global background colors. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1534–1538 (2020)

    Google Scholar 

  51. Zamora-Martínez, F., España-Boquera, S., Castro-Bleda, M.J.: Behaviour-based clustering of neural networks applied to document enhancement. In: Sandoval, F., Prieto, A., Cabestany, J., Graña, M. (eds.) Computational and Ambient Intelligence, pp. 144–151. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  52. Zhao, G., Liu, J., Jiang, J., Guan, H., Wen, J.: Skip-connected deep convolutional autoencoder for restoration of document images. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2935–2940 (2018)

    Google Scholar 

  53. Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumyadeep Dey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Dey, S., Jawanpuria, P. (2021). Light-Weight Document Image Cleanup Using Perceptual Loss. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)