Skip to main content
Log in

Conv-ViT fusion for improved handwritten Arabic character classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability Statement

generated or analyzed during the current study.

References

  1. Toledo, J.I., Carbonell, M., Fornés, A., Lladós, J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27–36 (2019). https://doi.org/10.1016/j.patcog.2018.08.020

    Article  Google Scholar 

  2. Tlemsani, R., Belbachir, K.: An improved Arabic on-line characters recognition system. In: 2018 International Arab Conference on Information Technology (ACIT), pp. 1–10. IEEE (2018). https://doi.org/10.1109/ACIT.2018.8672673

  3. Balaha, H.M., Ali, H.A., Youssef, E.K., Elsayed, A.E., Samak, R.A., Abdelhaleem, M.S., Tolba, M.M., Shehata, M.R., Mahmoud, M.R., Abdelhameed, M.M.: Recognizing Arabic handwritten characters using deep learning and genetic algorithms. Multimed. Tools Appl. 80, 32473–32509 (2021). https://doi.org/10.1007/s11042-021-11185-4

    Article  Google Scholar 

  4. Yaghan, M.A.: “Arabizi”: a contemporary style of Arabic slang. Des Issues 24(2), 39–52 (2008)

  5. Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 712–724 (2006). https://doi.org/10.1109/TPAMI.2006.102

    Article  Google Scholar 

  6. Vaidya, R., Trivedi, D., Satra, S., Pimpale, M.: Handwritten character recognition using deep-learning. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 772–775. IEEE (2018). https://doi.org/10.1109/TPAMI.2006.102

  7. Sanjekar, P., Patil, J.: An overview of multimodal biometrics. Signal Image Process. 4(1), 57 (2013). https://doi.org/10.5121/sipij.2013.4105

    Article  Google Scholar 

  8. Aguilar, J., Salazar, C., Velasco, H., Monsalve-Pulido, J., Montoya, E.: Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8(2), 30 (2020). https://doi.org/10.3390/computation8020030

    Article  Google Scholar 

  9. El-Sawy, A., El-Bakry, H., Loey, M.: CNN for handwritten Arabic digits recognition based on LeNet-5. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, pp. 566–575. Springer (2017). https://doi.org/10.1007/978-3-319-48308-5_54

  10. Altwaijry, N., Al-Turaiki, I.: Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 33(7), 2249–2261 (2021). https://doi.org/10.1007/s00521-020-05070-8

    Article  Google Scholar 

  11. Younis, K.S.: Arabic hand-written character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 3(3), 7 (2017). https://doi.org/10.5455/jjcit.71-1498142206

    Article  Google Scholar 

  12. Aggarwal, K., Mijwil, M.M., Al-Mistarehi, A.-H., Alomari, S., Gök, M., Alaabdin, A.M.Z., Abdulrhman, S.H.: Has the future started? The current growth of artificial intelligence, machine learning, and deep learning. Iraqi J. Comput. Sci. Math. 3(1), 115–123 (2022). https://doi.org/10.52866/ijcsm.2022.01.01.013

    Article  Google Scholar 

  13. Alkhateeb, J.H.: An effective deep learning approach for improving off-line Arabic handwritten character recognition. Int. J. Softw. Eng. Comput. Syst. 6(2), 53–61 (2020). https://doi.org/10.15282/ijsecs.6.2.2020.7.0076

    Article  Google Scholar 

  14. Ahmed, R., Gogate, M., Tahir, A., Dashtipour, K., Al-Tamimi, B., Hawalah, A., El-Affendi, M.A., Hussain, A.: Novel deep convolutional neural network-based contextual recognition of Arabic handwritten scripts. Entropy 23(3), 340 (2021). https://doi.org/10.3390/e23030340

    Article  Google Scholar 

  15. Wagaa, N., Kallel, H., Mellouli, N., et al.: Improved Arabic alphabet characters classification using convolutional neural networks (CNN). Comput. Intell. Neurosci. (2022). https://doi.org/10.1155/2022/9965426

    Article  Google Scholar 

  16. Nayef, B.H., Abdullah, S.N.H.S., Sulaiman, R., Alyasseri, Z.A.A.: Optimized leaky Relu for handwritten Arabic character recognition using convolution neural networks. Multimed. Tools Appl. (2022). https://doi.org/10.1007/s11042-021-11593-6

    Article  Google Scholar 

  17. Masruroh, S.U., Syahid, M.F., Munthaha, F., Muharram, A.T., Putri, R.A.: Deep convolutional neural networks transfer learning comparison on Arabic handwriting recognition system. JOIV Int. J. Inf. Visual. 7(2), 330–337 (2023). https://doi.org/10.30630/joiv.7.2.1605

    Article  Google Scholar 

  18. Elleuch, M., Maalej, R., Kherallah, M.: A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Comput. Sci. 80, 1712–1723 (2016). https://doi.org/10.1016/j.procs.2016.05.512

    Article  Google Scholar 

  19. Lawgali, A., Angelova, M., Bouridane, A.: HACDB: Handwritten Arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259 (2013). https://api.semanticscholar.org/CorpusID:14053532

  20. Pechwitz, M., Maddouri, S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT: database of handwritten Arabic words (2002). https://api.semanticscholar.org/CorpusID:15106190

  21. Alrobah, N., Albahli, S.: A hybrid deep model for recognizing Arabic handwritten characters. IEEE Access 9, 87058–87069 (2021). https://doi.org/10.1109/ACCESS.2021.3087647

    Article  Google Scholar 

  22. Khudeyer, R.S., Almoosawi, N.M.: Combination of machine learning algorithms and Resnet50 for Arabic handwritten classification. Informatica (2023). https://doi.org/10.31449/inf.v46i9.4375

    Article  Google Scholar 

  23. Alwagdani, M.S., Jaha, E.S.: Deep learning-based child handwritten Arabic character recognition and handwriting discrimination. Sensors 23(15), 6774 (2023). https://doi.org/10.3390/s23156774

    Article  Google Scholar 

  24. El Mamoun, M.: An effective combination of convolutional neural network and support vector machine classifier for Arabic handwritten recognition. Autom. Control. Comput. Sci. 57(3), 267–275 (2023). https://doi.org/10.3103/S0146411623030069

    Article  Google Scholar 

  25. Sahlol, A.T., Suen, C.Y., Zawbaa, H.M., Hassanien, A.E., Abd Elfattah, M.: Bio-inspired bat optimization algorithm for handwritten Arabic characters recognition. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 1749–1756. IEEE (2016). https://doi.org/10.1109/CEC.2016.7744000

  26. El-Mamoun, M., Mahmoud, Z., Kaddour, S.: SVM model selection using PSO for learning handwritten Arabic characters. Comput. Mater. Continua (2019). https://doi.org/10.32604/cmc.2019.08081

    Article  Google Scholar 

  27. Rawat, S., Solomon, D.D., Kanwar, K., Garg, S., Kumar, K., Mijwil, M.M., Beňova, E.: Indian sign language recognition system for interrogative words using deep learning. In: International Conference on Advances in Communication Technology and Computer Engineering, pp. 383–397. Springer (2023). https://doi.org/10.1007/978-3-031-37164-6_29

  28. Roy, S., Jain, A., Lal, S., Kini, J.: A study about color normalization methods for histopathology images. Micron 114, 42–61 (2018). https://doi.org/10.1016/j.micron.2018.07.005

    Article  Google Scholar 

  29. Kamel, M., Zhao, A.: Extraction of binary character/graphics images from grayscale document images. CVGIP Graph. Models Image Process. 55(3), 203–217 (1993). https://doi.org/10.1006/cgip.1993.1015

    Article  Google Scholar 

  30. Hou, Y.-C.: Visual cryptography for color images. Pattern Recogn. 36(7), 1619–1629 (2003). https://doi.org/10.1016/S0031-3203(02)00258-3

    Article  Google Scholar 

  31. Chen, X., Hsieh, C.-J., Gong, B.: When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv:2106.01548 (2021)

  32. Yu, S., Xie, L., Huang, Q.: Inception convolutional vision transformers for plant disease identification. Internet Things 21, 100650 (2023). https://doi.org/10.1016/j.iot.2022.100650

    Article  Google Scholar 

  33. Dutta, P., Sathi, K.A., Hossain, M.A., Dewan, M.A.A.: Conv-ViT: A convolution and vision transformer-based hybrid feature extraction method for retinal disease detection. J Imaging 9(7), 140 (2023). https://doi.org/10.3390/jimaging9070140

    Article  Google Scholar 

  34. Ma, X., Li, Z., Zhang, L.: An improved ResNet-50 for garbage image classification. Tehnički vjesnik 29(5), 1552–1559 (2022). https://doi.org/10.17559/TV-20220420124810

    Article  Google Scholar 

  35. Balaha, H.M., Ali, H.A., Saraya, M., Badawy, M.: A new Arabic handwritten character recognition deep learning system (AHCR-DLS). Neural Comput. Appl. 33(11), 6325–6367 (2021). https://doi.org/10.1007/s00521-020-05397-2

    Article  Google Scholar 

  36. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0

    Article  Google Scholar 

  37. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017). https://doi.org/10.1609/aaai.v31i1.11231

  38. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/cvpr.2016.90

  39. Shafiq, M., Gu, Z.: Deep residual learning for image recognition: a survey. Appl. Sci. 12(18), 8972 (2022). https://doi.org/10.3390/app12188972

    Article  Google Scholar 

  40. Ronald, M., Poulose, A., Han, D.S.: iSPLInception: An inception-ResNet deep learning architecture for human activity recognition. IEEE Access 9, 68985–69001 (2021). https://doi.org/10.1109/ACCESS.2021.3078184

    Article  Google Scholar 

  41. Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures (2016). arXiv:1603.08029

  42. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308

  43. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., : An image is worth \(16\times 16\) words: Transformers for image recognition at scale (2020). arXiv:2010.11929

  44. Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. In: European Conference on Computer Vision, pp. 280–296. Springer (2022). https://doi.org/10.1007/978-3-031-20077-9_17

  45. Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.-H., Ma, J.: V2x-ViT: Vehicle-to-everything cooperative perception with vision transformer. In: European Conference on Computer Vision, pp. 107–124. Springer (2022). https://doi.org/10.1007/978-3-031-19842-7_7

  46. Li, S., Wu, D., Wu, F., Zang, Z., Sun, B., Li, H., Xie, X., Li, S.: Architecture-agnostic masked image modeling—from ViT back to CNN (2022). arXiv:2205.13943

  47. Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., Liu, F.: ViT-YOLO: Transformer-based yolo for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2799–2808 (2021). https://doi.org/10.1109/ICCVW54120.2021.00314

  48. Sarwinda, D., Bustamam, A., Paradisa, R.H., Argyadiva, T., Mangunwardoyo, W.: Analysis of deep feature extraction for colorectal cancer detection. In: 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/ICICoS51170.2020.9298990

  49. Tagougui, N., Kherallah, M., Alimi, A.M.: Online Arabic handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. IJDAR 16, 209–226 (2013). https://doi.org/10.1007/s10032-012-0186-8

    Article  Google Scholar 

  50. Wang, H., Yang, M., Stufken, J.: Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 114(525), 393–405 (2019). https://doi.org/10.1080/01621459.2017.1408468

    Article  MathSciNet  Google Scholar 

  51. Tahir, Y., Chenfour, N., Harti, M.: Modélisation à objets d’une base de données morphologique pour la langue arabe. JEP-TALN (2004)

  52. Chicco, D., Tötsch, N., Jurman, G.: The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14(1), 1–22 (2021). https://doi.org/10.1186/s13040-021-00244-z

    Article  Google Scholar 

  53. Markoulidakis, I., Kopsiaftis, G., Rallis, I., Georgoulas, I.: Multi-class confusion matrix reduction method and its application on net promoter score classification problem. In: The 14th Pervasive Technologies Related to Assistive Environments Conference, pp. 412–419 (2021). https://doi.org/10.1145/3453892.3461323

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors have participated in (a) conception and design, or analysis and interpretation of the data. (b) drafting the article or revising it critically for important intellectual content. (c) approval of the final version.

Corresponding author

Correspondence to Sarra Rouabhi.

Ethics declarations

Conflicts of interests

The interest statement does not apply to the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rouabhi, S., Azerine, A., Tlemsani, R. et al. Conv-ViT fusion for improved handwritten Arabic character classification. SIViP (2024). https://doi.org/10.1007/s11760-024-03158-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03158-5

Keywords

Navigation