Conv-ViT fusion for improved handwritten Arabic character classification

Rouabhi, Sarra; Azerine, Abdennour; Tlemsani, Redouane; Essaid, Mokhtar; Idoumghar, Lhassane

doi:10.1007/s11760-024-03158-5

Conv-ViT fusion for improved handwritten Arabic character classification

Original Paper
Published: 29 April 2024

(2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Sarra Rouabhi¹,
Abdennour Azerine²^na1,
Redouane Tlemsani³^na1,
Mokhtar Essaid²^na1 &
…
Lhassane Idoumghar²^na1

82 Accesses
Explore all metrics

Abstract

An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handwritten Arabic Character Recognition: Comparison of Conventional Machine Learning and Deep Learning Approaches

Bayesian Versus Convolutional Networks for Arabic Handwriting Recognition

Article 28 May 2019

CNN-based Methods for Offline Arabic Handwriting Recognition: A Review

Article Open access 19 March 2024

Data Availability Statement

generated or analyzed during the current study.

References

Toledo, J.I., Carbonell, M., Fornés, A., Lladós, J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27–36 (2019). https://doi.org/10.1016/j.patcog.2018.08.020
Article Google Scholar
Tlemsani, R., Belbachir, K.: An improved Arabic on-line characters recognition system. In: 2018 International Arab Conference on Information Technology (ACIT), pp. 1–10. IEEE (2018). https://doi.org/10.1109/ACIT.2018.8672673
Balaha, H.M., Ali, H.A., Youssef, E.K., Elsayed, A.E., Samak, R.A., Abdelhaleem, M.S., Tolba, M.M., Shehata, M.R., Mahmoud, M.R., Abdelhameed, M.M.: Recognizing Arabic handwritten characters using deep learning and genetic algorithms. Multimed. Tools Appl. 80, 32473–32509 (2021). https://doi.org/10.1007/s11042-021-11185-4
Article Google Scholar
Yaghan, M.A.: “Arabizi”: a contemporary style of Arabic slang. Des Issues 24(2), 39–52 (2008)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 712–724 (2006). https://doi.org/10.1109/TPAMI.2006.102
Article Google Scholar
Vaidya, R., Trivedi, D., Satra, S., Pimpale, M.: Handwritten character recognition using deep-learning. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 772–775. IEEE (2018). https://doi.org/10.1109/TPAMI.2006.102
Sanjekar, P., Patil, J.: An overview of multimodal biometrics. Signal Image Process. 4(1), 57 (2013). https://doi.org/10.5121/sipij.2013.4105
Article Google Scholar
Aguilar, J., Salazar, C., Velasco, H., Monsalve-Pulido, J., Montoya, E.: Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8(2), 30 (2020). https://doi.org/10.3390/computation8020030
Article Google Scholar
El-Sawy, A., El-Bakry, H., Loey, M.: CNN for handwritten Arabic digits recognition based on LeNet-5. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, pp. 566–575. Springer (2017). https://doi.org/10.1007/978-3-319-48308-5_54
Altwaijry, N., Al-Turaiki, I.: Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 33(7), 2249–2261 (2021). https://doi.org/10.1007/s00521-020-05070-8
Article Google Scholar
Younis, K.S.: Arabic hand-written character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 3(3), 7 (2017). https://doi.org/10.5455/jjcit.71-1498142206
Article Google Scholar
Aggarwal, K., Mijwil, M.M., Al-Mistarehi, A.-H., Alomari, S., Gök, M., Alaabdin, A.M.Z., Abdulrhman, S.H.: Has the future started? The current growth of artificial intelligence, machine learning, and deep learning. Iraqi J. Comput. Sci. Math. 3(1), 115–123 (2022). https://doi.org/10.52866/ijcsm.2022.01.01.013
Article Google Scholar
Alkhateeb, J.H.: An effective deep learning approach for improving off-line Arabic handwritten character recognition. Int. J. Softw. Eng. Comput. Syst. 6(2), 53–61 (2020). https://doi.org/10.15282/ijsecs.6.2.2020.7.0076
Article Google Scholar
Ahmed, R., Gogate, M., Tahir, A., Dashtipour, K., Al-Tamimi, B., Hawalah, A., El-Affendi, M.A., Hussain, A.: Novel deep convolutional neural network-based contextual recognition of Arabic handwritten scripts. Entropy 23(3), 340 (2021). https://doi.org/10.3390/e23030340
Article Google Scholar
Wagaa, N., Kallel, H., Mellouli, N., et al.: Improved Arabic alphabet characters classification using convolutional neural networks (CNN). Comput. Intell. Neurosci. (2022). https://doi.org/10.1155/2022/9965426
Article Google Scholar
Nayef, B.H., Abdullah, S.N.H.S., Sulaiman, R., Alyasseri, Z.A.A.: Optimized leaky Relu for handwritten Arabic character recognition using convolution neural networks. Multimed. Tools Appl. (2022). https://doi.org/10.1007/s11042-021-11593-6
Article Google Scholar
Masruroh, S.U., Syahid, M.F., Munthaha, F., Muharram, A.T., Putri, R.A.: Deep convolutional neural networks transfer learning comparison on Arabic handwriting recognition system. JOIV Int. J. Inf. Visual. 7(2), 330–337 (2023). https://doi.org/10.30630/joiv.7.2.1605
Article Google Scholar
Elleuch, M., Maalej, R., Kherallah, M.: A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Comput. Sci. 80, 1712–1723 (2016). https://doi.org/10.1016/j.procs.2016.05.512
Article Google Scholar
Lawgali, A., Angelova, M., Bouridane, A.: HACDB: Handwritten Arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259 (2013). https://api.semanticscholar.org/CorpusID:14053532
Pechwitz, M., Maddouri, S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT: database of handwritten Arabic words (2002). https://api.semanticscholar.org/CorpusID:15106190
Alrobah, N., Albahli, S.: A hybrid deep model for recognizing Arabic handwritten characters. IEEE Access 9, 87058–87069 (2021). https://doi.org/10.1109/ACCESS.2021.3087647
Article Google Scholar
Khudeyer, R.S., Almoosawi, N.M.: Combination of machine learning algorithms and Resnet50 for Arabic handwritten classification. Informatica (2023). https://doi.org/10.31449/inf.v46i9.4375
Article Google Scholar
Alwagdani, M.S., Jaha, E.S.: Deep learning-based child handwritten Arabic character recognition and handwriting discrimination. Sensors 23(15), 6774 (2023). https://doi.org/10.3390/s23156774
Article Google Scholar
El Mamoun, M.: An effective combination of convolutional neural network and support vector machine classifier for Arabic handwritten recognition. Autom. Control. Comput. Sci. 57(3), 267–275 (2023). https://doi.org/10.3103/S0146411623030069
Article Google Scholar
Sahlol, A.T., Suen, C.Y., Zawbaa, H.M., Hassanien, A.E., Abd Elfattah, M.: Bio-inspired bat optimization algorithm for handwritten Arabic characters recognition. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 1749–1756. IEEE (2016). https://doi.org/10.1109/CEC.2016.7744000
El-Mamoun, M., Mahmoud, Z., Kaddour, S.: SVM model selection using PSO for learning handwritten Arabic characters. Comput. Mater. Continua (2019). https://doi.org/10.32604/cmc.2019.08081
Article Google Scholar
Rawat, S., Solomon, D.D., Kanwar, K., Garg, S., Kumar, K., Mijwil, M.M., Beňova, E.: Indian sign language recognition system for interrogative words using deep learning. In: International Conference on Advances in Communication Technology and Computer Engineering, pp. 383–397. Springer (2023). https://doi.org/10.1007/978-3-031-37164-6_29
Roy, S., Jain, A., Lal, S., Kini, J.: A study about color normalization methods for histopathology images. Micron 114, 42–61 (2018). https://doi.org/10.1016/j.micron.2018.07.005
Article Google Scholar
Kamel, M., Zhao, A.: Extraction of binary character/graphics images from grayscale document images. CVGIP Graph. Models Image Process. 55(3), 203–217 (1993). https://doi.org/10.1006/cgip.1993.1015
Article Google Scholar
Hou, Y.-C.: Visual cryptography for color images. Pattern Recogn. 36(7), 1619–1629 (2003). https://doi.org/10.1016/S0031-3203(02)00258-3
Article Google Scholar
Chen, X., Hsieh, C.-J., Gong, B.: When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv:2106.01548 (2021)
Yu, S., Xie, L., Huang, Q.: Inception convolutional vision transformers for plant disease identification. Internet Things 21, 100650 (2023). https://doi.org/10.1016/j.iot.2022.100650
Article Google Scholar
Dutta, P., Sathi, K.A., Hossain, M.A., Dewan, M.A.A.: Conv-ViT: A convolution and vision transformer-based hybrid feature extraction method for retinal disease detection. J Imaging 9(7), 140 (2023). https://doi.org/10.3390/jimaging9070140
Article Google Scholar
Ma, X., Li, Z., Zhang, L.: An improved ResNet-50 for garbage image classification. Tehnički vjesnik 29(5), 1552–1559 (2022). https://doi.org/10.17559/TV-20220420124810
Article Google Scholar
Balaha, H.M., Ali, H.A., Saraya, M., Badawy, M.: A new Arabic handwritten character recognition deep learning system (AHCR-DLS). Neural Comput. Appl. 33(11), 6325–6367 (2021). https://doi.org/10.1007/s00521-020-05397-2
Article Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017). https://doi.org/10.1609/aaai.v31i1.11231
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/cvpr.2016.90
Shafiq, M., Gu, Z.: Deep residual learning for image recognition: a survey. Appl. Sci. 12(18), 8972 (2022). https://doi.org/10.3390/app12188972
Article Google Scholar
Ronald, M., Poulose, A., Han, D.S.: iSPLInception: An inception-ResNet deep learning architecture for human activity recognition. IEEE Access 9, 68985–69001 (2021). https://doi.org/10.1109/ACCESS.2021.3078184
Article Google Scholar
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures (2016). arXiv:1603.08029
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., : An image is worth \(16\times 16\) words: Transformers for image recognition at scale (2020). arXiv:2010.11929
Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. In: European Conference on Computer Vision, pp. 280–296. Springer (2022). https://doi.org/10.1007/978-3-031-20077-9_17
Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.-H., Ma, J.: V2x-ViT: Vehicle-to-everything cooperative perception with vision transformer. In: European Conference on Computer Vision, pp. 107–124. Springer (2022). https://doi.org/10.1007/978-3-031-19842-7_7
Li, S., Wu, D., Wu, F., Zang, Z., Sun, B., Li, H., Xie, X., Li, S.: Architecture-agnostic masked image modeling—from ViT back to CNN (2022). arXiv:2205.13943
Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., Liu, F.: ViT-YOLO: Transformer-based yolo for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2799–2808 (2021). https://doi.org/10.1109/ICCVW54120.2021.00314
Sarwinda, D., Bustamam, A., Paradisa, R.H., Argyadiva, T., Mangunwardoyo, W.: Analysis of deep feature extraction for colorectal cancer detection. In: 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/ICICoS51170.2020.9298990
Tagougui, N., Kherallah, M., Alimi, A.M.: Online Arabic handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. IJDAR 16, 209–226 (2013). https://doi.org/10.1007/s10032-012-0186-8
Article Google Scholar
Wang, H., Yang, M., Stufken, J.: Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 114(525), 393–405 (2019). https://doi.org/10.1080/01621459.2017.1408468
Article MathSciNet Google Scholar
Tahir, Y., Chenfour, N., Harti, M.: Modélisation à objets d’une base de données morphologique pour la langue arabe. JEP-TALN (2004)
Chicco, D., Tötsch, N., Jurman, G.: The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14(1), 1–22 (2021). https://doi.org/10.1186/s13040-021-00244-z
Article Google Scholar
Markoulidakis, I., Kopsiaftis, G., Rallis, I., Georgoulas, I.: Multi-class confusion matrix reduction method and its application on net promoter score classification problem. In: The 14th Pervasive Technologies Related to Assistive Environments Conference, pp. 412–419 (2021). https://doi.org/10.1145/3453892.3461323

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Abdennour Azerine, Redouane Tlemsani, Mokhtar Essaid and Lhassane Idoumghar have contributed equally to this work.

Authors and Affiliations

Laboratoire de Codage et Sécurité d’Information (LACOSI) - Département d’Electronique - Faculté de Génie Electrique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB, BP 1505, 31000, EL M’naouer, Oran, Algeria
Sarra Rouabhi
IRIMAS UR 7499, Université de Haute -Alsace, 68100, Mulhouse, France
Abdennour Azerine, Mokhtar Essaid & Lhassane Idoumghar
Département d’Informatique - Faculté des Mathématique et Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB, BP 1505, 31000, EL M’naouer, Oran, Algeria
Redouane Tlemsani

Authors

Sarra Rouabhi
View author publications
You can also search for this author in PubMed Google Scholar
Abdennour Azerine
View author publications
You can also search for this author in PubMed Google Scholar
Redouane Tlemsani
View author publications
You can also search for this author in PubMed Google Scholar
Mokhtar Essaid
View author publications
You can also search for this author in PubMed Google Scholar
Lhassane Idoumghar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have participated in (a) conception and design, or analysis and interpretation of the data. (b) drafting the article or revising it critically for important intellectual content. (c) approval of the final version.

Corresponding author

Correspondence to Sarra Rouabhi.

Ethics declarations

Conflicts of interests

The interest statement does not apply to the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rouabhi, S., Azerine, A., Tlemsani, R. et al. Conv-ViT fusion for improved handwritten Arabic character classification. SIViP (2024). https://doi.org/10.1007/s11760-024-03158-5

Download citation

Received: 24 December 2023
Revised: 07 March 2024
Accepted: 16 March 2024
Published: 29 April 2024
DOI: https://doi.org/10.1007/s11760-024-03158-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conv-ViT fusion for improved handwritten Arabic character classification

Abstract

Access this article

Similar content being viewed by others

Handwritten Arabic Character Recognition: Comparison of Conventional Machine Learning and Deep Learning Approaches

Bayesian Versus Convolutional Networks for Arabic Handwriting Recognition

CNN-based Methods for Offline Arabic Handwriting Recognition: A Review

Data Availability Statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Conv-ViT fusion for improved handwritten Arabic character classification

Abstract

Access this article

Similar content being viewed by others

Handwritten Arabic Character Recognition: Comparison of Conventional Machine Learning and Deep Learning Approaches

Bayesian Versus Convolutional Networks for Arabic Handwriting Recognition

CNN-based Methods for Offline Arabic Handwriting Recognition: A Review

Data Availability Statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation