Advertisement

Can You Read Me Now? Content Aware Rectification Using Angle Supervision

Conference paper
  • 1.1k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)

Abstract

The ubiquity of smartphone cameras has led to more and more documents being captured by cameras rather than scanned. Unlike flatbed scanners, photographed documents are often folded and crumpled, resulting in large local variance in text structure. The problem of document rectification is fundamental to the Optical Character Recognition (OCR) process on documents, and its ability to overcome geometric distortions significantly affects recognition accuracy. Despite the great progress in recent OCR systems, most still rely on a pre-process that ensures the text lines are straight and axis aligned. Recent works have tackled the problem of rectifying document images taken in-the-wild using various supervision signals and alignment means. However, they focused on global features that can be extracted from the document’s boundaries, ignoring various signals that could be obtained from the document’s content.

We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document’s content, the location of the words and specifically their orientation, as hints to assist in the rectification process. We utilize a novel pixel-wise angle regression approach and a curvature estimation side-task for optimizing our rectification model. Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.

Supplementary material

504453_1_En_13_MOESM1_ESM.pdf (8 mb)
Supplementary material 1 (pdf 8228 KB)

References

  1. 1.
    Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: International Conference on Computer Vision (ICCV) (2019, to appear)Google Scholar
  2. 2.
    Bajjer Ramanna, V.K., Bukhari, S.S., Dengel, A.: Document image dewarping using deep learning. In: The 8th International Conference on Pattern Recognition Applications and Methods, International Conference on Pattern Recognition Applications and Methods (ICPRAM-2019), 19–21 February Prague, Czech Republic. Insticc (2019)Google Scholar
  3. 3.
    Brown, M.S., Seales, W.B.: Image restoration of arbitrarily warped documents. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1295–1306 (2004)CrossRefGoogle Scholar
  4. 4.
    Burden, A., Cote, M., Albu, A.B.: Rectification of camera-captured document images with mixed contents and varied layouts. In: 2019 16th Conference on Computer and Robot Vision (CRV), pp. 33–40. IEEE (2019)Google Scholar
  5. 5.
    Community B.O.: Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org
  6. 6.
    Das, S., Ma, K., Shu, Z., Samaras, D., Shilkrot, R.: DewarpNet: single-image document unwarping with stacked 3D and 2D regression networks. In: The IEEE International Conference on Computer Vision (ICCV), October 2019Google Scholar
  7. 7.
    Das, S., Mishra, G., Sudharshana, A., Shilkrot, R.: The common fold: utilizing the four-fold to dewarp printed documents from a single image. In: Proceedings of the 2017 ACM Symposium on Document Engineering, DocEng 2017, pp. 125–128. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3103010.3121030
  8. 8.
    Grüning, T., Leifert, G., Strauß, T., Michael, J., Labahn, R.: A two-stage method for text line detection in historical documents. Int. J. Docu. Anal. Recogn. (IJDAR) 22(3), 285–302 (2019).  https://doi.org/10.1007/s10032-019-00332-1CrossRefGoogle Scholar
  9. 9.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  10. 10.
    Huang, Z., Gu, J., Meng, G., Pan, C.: Text line extraction of curved document images using hybrid metric. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 251–255, November 2015.  https://doi.org/10.1109/ACPR.2015.7486504
  11. 11.
    Amazon Inc.: Amazon textract. https://aws.amazon.com/textract
  12. 12.
    Google Inc.: Detect text in images. https://cloud.google.com/vision/docs/ocr
  13. 13.
    Ma, K., Shu, Z., Bai, X., Wang, J., Samaras, D.: DocUNet: document image unwarping via a stacked U-Net. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  14. 14.
    Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. logist. Q. 2(1–2), 83–97 (1955)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversalsGoogle Scholar
  16. 16.
    Li, X., Zhang, B., Liao, J., Sander, P.V.: Document rectification and illumination correction using a patch-based CNN. ACM Trans. Graph. (TOG) 38(6), 1 (2019)Google Scholar
  17. 17.
    Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018Google Scholar
  18. 18.
    Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 71–88. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_5CrossRefGoogle Scholar
  19. 19.
    Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)CrossRefGoogle Scholar
  20. 20.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates Inc. (2015)Google Scholar
  21. 21.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  22. 22.
    Smith, R.: An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)Google Scholar
  23. 23.
    Sorkine-Hornung, O.: Laplacian mesh processing. In: Eurographics (2005)Google Scholar
  24. 24.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  25. 25.
    You, S., Matsushita, Y., Sinha, S., Bou, Y.B., Ikeuchi, K.: Multiview rectification of folded documents. IEEE Trans. Pattern Anal. Mach. Intell. 40, 505–511 (2016)CrossRefGoogle Scholar
  26. 26.
    Yousef, M., Bishop, T.E.: OrigamiNet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)Google Scholar
  27. 27.
    Zheng, Y., Kang, X., Li, S., He, Y., Sun, J.: Real-time document image super-resolution by fast matting. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 232–236. IEEE (2014)Google Scholar
  28. 28.
    Zhou, X., et al.: East: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Amazon Web ServicesSeattleUSA

Personalised recommendations