Abstract
Unconstrained off-line handwriting text recognition in general and for Arabic-like scripts in particular is a challenging task and is still an active research area. Transformer-based models for English handwriting recognition have recently shown promising results. In this paper, we have explored the use of transformer architecture for Urdu handwriting recognition. The use of a convolution neural network before a Vanilla full transformer and using Urdu printed text-lines along with handwritten text lines during the training are the highlights of the proposed work. The convolution layers act to reduce the spatial resolutions and compensate for the \(n^{2}\) complexity of transformer multi-head attention layers. Moreover, the printed text images in the training phase help the model in learning a greater number of ligatures (a prominent feature of Arabic-like scripts) and a better language model. Our model achieved state-of-the-art accuracy (CER of \(5.31\%\)) on publicly available NUST-UHWR dataset (Zia et al. in Neural Comput Appl 34:1–14, 2021).
Similar content being viewed by others
Notes
References
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: 2010 20th International Conference on Pattern Recognition, pp. 1900–1903 (2010)
Zia, N., Naeem, M.F., Raza, S.K., Khan, M.M., Ul-Hasan, A., Shafait, F.: A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition. Neural Comput. Appl. 34, 1–14 (2021)
Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed, S.B., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)
Asad, K., Asghar, M.Z., Anam, S., Hameed, I.A., Asif, S.H., Shakeel, A.: A survey on sentiment analysis in Urdu: a resource-poor language. Egypt. Inform. J. 22(1), 53–74 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Hassan, S., Irfan, A., Mirza, A., Siddiqi, I.: Cursive handwritten text recognition using bi-directional LSTMs: a case study on Urdu handwriting. In: 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pp. 67–72 (2019)
Husnain, M., Saad Missen, M.M., Mumtaz, S., Jhanidr, M.Z., Coustaty, M., Muzzamil, L.M., Ogier, J., Choi, G.S.: Recognition of Urdu handwritten characters using convolutional neural network. Appl. Sci. 9(13), 2758 (2019)
Ul-Hasan, A., Ahmed, S.B., Rashid, F., Shafait, F., Breuel, T.M.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1061–1065 (2013)
Naz, S., Umar, A.I., Ahmed, R., Siddiqi, I., Ahmed, S., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)
Naz, S., Umar, A.I., Ahmed, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5, 1–16 (2016)
Khan, K., Haider, I.: Online recognition of multi-stroke handwritten Urdu characters. In: 2010 International Conference on Image Analysis and Signal Processing, pp. 284–290 (2010)
Ahmed, S., Naz, S., Razzak, S.M., Umar, A.: Ucom offline dataset—a Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. 14(2), 03 (2016)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–901 (2020)
Michael, J., Labahn, R., Gruning, T., Zollner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019)
Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers amp; distillation through attention. In: Marina, M., Tong Z. (eds.), Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR, 18–24 (2021)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers distillation through attention (2020)
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized Bert pretraining approach (2019)
Riaz, N., Latif, S., Latif, R.: From transformers to reformers. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–6 (2021)
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)
Naeem, M.F., Zia, N., Awan, A., Ul-Hasan, A., Shafait, F.: Impact of ligature coverage on training practical Urdu OCR systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 131–136 (2017)
Rehman, A., Ul-Hasan, A., Shafait, F.: High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks. In: International Conference on Document Analysis and Recognition, pp. 336–352. Springer (2021)
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The author(s) disclosed no possible conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is partially funded by Higher Commission (HEC), Pakistan’s grant for the National Center of Artificial Intelligence (NCAI).
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Riaz, N., Arbab, H., Maqsood, A. et al. Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition. IJDAR 25, 373–384 (2022). https://doi.org/10.1007/s10032-022-00416-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-022-00416-5