Skip to main content

\(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality


Most image enhancement methods focused on restoration of digitized textual documents are limited to cases where the text information is still preserved in the input image, which may often not be the case. In this work, we propose a novel generative document restoration method which allows conditioning the restoration on a guiding signal in the form of target text transcription and which does not need paired high- and low-quality images for training. We introduce a neural network architecture with an implicit text-to-image alignment module. We demonstrate good results on inpainting, debinarization and deblurring tasks, and we show that the trained models can be used to manually alter text in document images. A user study shows that that human observers confuse the outputs of the proposed enhancement method with reference high-quality images in as many as 30% of cases.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. The demonstration tool with the trained newspaper restoration and inpainting models along with image examples is publicly available at The repository also includes training scripts and links to training data.


  1. Bal, G., Agam, G., Frieder, O., Frieder, G.: Interactive degraded document enhancement and ground truth generation. In: Yanikoglu BA, Berkner K (eds) Document Recognition and Retrieval XV, SPIE. (2008).

  2. Chen, X., He, X., Yang, J., Wu, Q.: An effective document image deblurring algorithm. In: CVPR 2011. IEEE. (2011).

  3. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016).

    Article  Google Scholar 

  4. Fang, X., Zhou, Q., Shen, J., Jacquemin, C., Shao, L.: Text image deblurring using kernel sparsity prior. IEEE Trans. Cybern. 50(3), 997–1008 (2018).

    Article  Google Scholar 

  5. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc., (2014)

  6. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification. In: Proceedings of the 23rd international conference on Machine learning—ICML 2006. ACM Press (2006).

  7. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30, pp. 5767–5777. Curran Associates Inc. (2017)

  8. He, S., Schomaker, L.: DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019).

    Article  Google Scholar 

  9. Hradiš, M., Kotera, J., Zemčík, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Presented at the (2015). In: Proceedings of the British Machine Vision Conference 2015, British Machine Vision Association

  10. Hu, X., Naiel, M.A., Wong, A., Lamm, M., Fieguth, P.: Runet: A robust UNET architecture for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

  11. Jiao, J., Sun, J., Satoshi, N.: A convolutional neural network based two-stage document deblurring. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017).

  12. Kahle, P., Colutto, S., Hackl, G., Muhlberger, G.: Transkribus—a service platform for transcription, recognition and retrieval of historical documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE (2017).

  13. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CoRR arxiv:1812.04948 (2018)

  14. Kinoshita, K., Delcroix, M., Ogawa, A., Nakatani, T.: Text-informed speech enhancement with deep neural networks (2015)

  15. Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019).

  16. Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: DeblurGAN: Blind motion deblurring using conditional adversarial networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018).

  17. Lahiri, A., Jain, A., Biswas, P.K., Mitra, P.: Improving consistency and correctness of sequence inpainting using semantically guided generative adversarial network. arXiv:1711.06106 (2017)

  18. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2017).

  19. Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2Noise: Learning image restoration without clean data. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholmsmässan, Stockholm Sweden, Proceedings of Machine Learning Research, vol. 80, pp. 2965–2974. (2018)

  20. Leung, C.C., Chan, K.S., Chan, H.M., Tsui, W.K.: A new approach for image enhancement applied to low-contrast-low-illumination IC and document images. Pattern Recogn. Lett. 26(6), 769–778 (2005).

    Article  Google Scholar 

  21. Liao, C.F., Tsao, Y., Lu, X., Kawai, H.: Incorporating symbolic sequential modeling for speech enhancement. Proc. Interspeech 2019, 2733–2737 (2019).

    Article  Google Scholar 

  22. Lu, D., Huang, X., Sui, L.: Binarization of degraded document images based on contrast enhancement. Int. J. Document Anal. Recognit. (IJDAR) 21(1–2), 123–135 (2018).

    Article  Google Scholar 

  23. Madam, N.T., Kumar, S., Rajagopalan, A.N.: Unsupervised class-specific deblurring. In: Computer Vision—ECCV 2018, pp. 358–374. Springer International Publishing (2018).

  24. Mujumdar, S., Gupta, N., Jain, A., Burdick, D.: Simultaneous optimisation of image quality improvement and text content extraction from scanned documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019).

  25. Murray, R.L.: Toward a metadata standard for digitized historical newspapers. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries—JCDL 2005. ACM Press (2005).

  26. Mustafa, W.A., Yazid, H.: Illumination and contrast correction strategy using bilateral filtering and binarization comparison (2016)

  27. Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring text images via l0-regularized intensity and gradient prior. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014).

  28. Pandey, R.K., Ramakrishnan, A.G.: Improving the perceptual quality of document images using deep neural network, In: Advances in Neural Networks—ISNN 2019. Springer International Publishing, pp. 448–459 (2019).

  29. Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images, pp. 123–130 (2013).

  30. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting (2016)

  31. Ramakrishnan, S., Pachori, S., Gangopadhyay, A., Raman, S.: Deep generative filter for motion deblurring. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE (2017).

  32. Ronneberger, O., PFischer, Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, vol 9351, pp. 234–241., arXiv:1505.04597 [cs.CV] (2015)

  33. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  34. Sulaiman, Omar Nasrudin: Degraded historical document binarization: A review on issues, challenges, techniques, and future directions. J. Imaging 5(4), 48 (2019).

    Article  Google Scholar 

  35. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds) Advances in Neural Information Processing Systems 30. Curran Associates Inc., pp. 5998–6008, (2017)

  36. Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: Textsr: Content-aware text super-resolution guided by recognition. arXiv:1909.07113 (2019)

  37. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. arXiv:1801.07892 (2018)

  38. Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2017).

    Article  Google Scholar 

  39. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Download references


This work has been supported by the Ministry of Culture Czech Republic in NAKI II project PERO (DG18P02OVV055) and by the Ministry of Education, Youth and Sports of the Czech Republic from the National Programme of Sustainability (NPU II), through the Project IT4Innovations Excellence in Science under Grant LQ1602. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA TITAN Xp GPU for this research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Oldřich Kodym.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kodym, O., Hradiš, M. \(\hbox {TG}^2\): text-guided transformer GAN for restoring document readability and perceived quality. IJDAR 25, 15–28 (2022).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Generative adversarial networks
  • Attention neural networks
  • Textual document restoration
  • Text inpainting