Skip to main content
Log in

Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition

  • Special Issue Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript


Unconstrained off-line handwriting text recognition in general and for Arabic-like scripts in particular is a challenging task and is still an active research area. Transformer-based models for English handwriting recognition have recently shown promising results. In this paper, we have explored the use of transformer architecture for Urdu handwriting recognition. The use of a convolution neural network before a Vanilla full transformer and using Urdu printed text-lines along with handwritten text lines during the training are the highlights of the proposed work. The convolution layers act to reduce the spatial resolutions and compensate for the \(n^{2}\) complexity of transformer multi-head attention layers. Moreover, the printed text images in the training phase help the model in learning a greater number of ligatures (a prominent feature of Arabic-like scripts) and a better language model. Our model achieved state-of-the-art accuracy (CER of \(5.31\%\)) on publicly available NUST-UHWR dataset (Zia et al. in Neural Comput Appl 34:1–14, 2021).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others











  1. Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: 2010 20th International Conference on Pattern Recognition, pp. 1900–1903 (2010)

  2. Zia, N., Naeem, M.F., Raza, S.K., Khan, M.M., Ul-Hasan, A., Shafait, F.: A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition. Neural Comput. Appl. 34, 1–14 (2021)

    Google Scholar 

  3. Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed, S.B., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)

    Article  Google Scholar 

  4. Asad, K., Asghar, M.Z., Anam, S., Hameed, I.A., Asif, S.H., Shakeel, A.: A survey on sentiment analysis in Urdu: a resource-poor language. Egypt. Inform. J. 22(1), 53–74 (2021)

    Article  Google Scholar 

  5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  6. Hassan, S., Irfan, A., Mirza, A., Siddiqi, I.: Cursive handwritten text recognition using bi-directional LSTMs: a case study on Urdu handwriting. In: 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pp. 67–72 (2019)

  7. Husnain, M., Saad Missen, M.M., Mumtaz, S., Jhanidr, M.Z., Coustaty, M., Muzzamil, L.M., Ogier, J., Choi, G.S.: Recognition of Urdu handwritten characters using convolutional neural network. Appl. Sci. 9(13), 2758 (2019)

    Article  Google Scholar 

  8. Ul-Hasan, A., Ahmed, S.B., Rashid, F., Shafait, F., Breuel, T.M.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1061–1065 (2013)

  9. Naz, S., Umar, A.I., Ahmed, R., Siddiqi, I., Ahmed, S., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)

    Article  Google Scholar 

  10. Naz, S., Umar, A.I., Ahmed, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5, 1–16 (2016)

    Article  Google Scholar 

  11. Khan, K., Haider, I.: Online recognition of multi-stroke handwritten Urdu characters. In: 2010 International Conference on Image Analysis and Signal Processing, pp. 284–290 (2010)

  12. Ahmed, S., Naz, S., Razzak, S.M., Umar, A.: Ucom offline dataset—a Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. 14(2), 03 (2016)

    Google Scholar 

  13. Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding

  14. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–901 (2020)

    Google Scholar 

  15. Michael, J., Labahn, R., Gruning, T., Zollner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019)

  16. Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)

  17. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers amp; distillation through attention. In: Marina, M., Tong Z. (eds.), Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR, 18–24 (2021)

  18. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers distillation through attention (2020)

  19. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized Bert pretraining approach (2019)

  20. Riaz, N., Latif, S., Latif, R.: From transformers to reformers. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–6 (2021)

  21. O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)

  22. Naeem, M.F., Zia, N., Awan, A., Ul-Hasan, A., Shafait, F.: Impact of ligature coverage on training practical Urdu OCR systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 131–136 (2017)

  23. Rehman, A., Ul-Hasan, A., Shafait, F.: High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks. In: International Conference on Document Analysis and Recognition, pp. 336–352. Springer (2021)

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Nauman Riaz or Adnan Ul-Hasan.

Ethics declarations

Conflict of interest

The author(s) disclosed no possible conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is partially funded by Higher Commission (HEC), Pakistan’s grant for the National Center of Artificial Intelligence (NCAI).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riaz, N., Arbab, H., Maqsood, A. et al. Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition. IJDAR 25, 373–384 (2022).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: