Abstract
An essential aspect of pattern recognition pertains to handwriting recognition, particularly in languages with diverse character styles like Arabic. Arabic characters present a challenge due to their varied writing styles, intricate interconnections within words, and shape modifications based on position. The complexity of Arabic calligraphy further complicates recognition, with subtle letter connections and potential distortions influenced by writing speed and individual skills. Therefore, in this paper, we provided a method to address the problem of detecting handwritten Arabic characters utilizing an ensemble of deep learning techniques. Our approach extracts hierarchical features from complicated, high-resolution pictures using pre-trained models: the Vision Transformer (ViT) and Inception ResNet V2. To enhance model performance, we present tunable lambda coefficients for the weighted arithmetic integration of the two models. Experiments conducted on the HMBD dataset, categorized into subsets based on writing positions, yielded promising results. Our ensemble model achieved robust test accuracies ranging from 89 to 98% across these subsets. Analysis revealed that remaining errors primarily stem from visual-spatial similarities between certain characters and inaccuracies in ground-truth labels. Our contribution highlights the efficacy of ensemble approaches, combining transformers and CNNs, in addressing the intricacies of handwritten Arabic recognition.
Similar content being viewed by others
Data Availability Statement
generated or analyzed during the current study.
References
Toledo, J.I., Carbonell, M., Fornés, A., Lladós, J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27–36 (2019). https://doi.org/10.1016/j.patcog.2018.08.020
Tlemsani, R., Belbachir, K.: An improved Arabic on-line characters recognition system. In: 2018 International Arab Conference on Information Technology (ACIT), pp. 1–10. IEEE (2018). https://doi.org/10.1109/ACIT.2018.8672673
Balaha, H.M., Ali, H.A., Youssef, E.K., Elsayed, A.E., Samak, R.A., Abdelhaleem, M.S., Tolba, M.M., Shehata, M.R., Mahmoud, M.R., Abdelhameed, M.M.: Recognizing Arabic handwritten characters using deep learning and genetic algorithms. Multimed. Tools Appl. 80, 32473–32509 (2021). https://doi.org/10.1007/s11042-021-11185-4
Yaghan, M.A.: “Arabizi”: a contemporary style of Arabic slang. Des Issues 24(2), 39–52 (2008)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 712–724 (2006). https://doi.org/10.1109/TPAMI.2006.102
Vaidya, R., Trivedi, D., Satra, S., Pimpale, M.: Handwritten character recognition using deep-learning. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 772–775. IEEE (2018). https://doi.org/10.1109/TPAMI.2006.102
Sanjekar, P., Patil, J.: An overview of multimodal biometrics. Signal Image Process. 4(1), 57 (2013). https://doi.org/10.5121/sipij.2013.4105
Aguilar, J., Salazar, C., Velasco, H., Monsalve-Pulido, J., Montoya, E.: Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8(2), 30 (2020). https://doi.org/10.3390/computation8020030
El-Sawy, A., El-Bakry, H., Loey, M.: CNN for handwritten Arabic digits recognition based on LeNet-5. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016, pp. 566–575. Springer (2017). https://doi.org/10.1007/978-3-319-48308-5_54
Altwaijry, N., Al-Turaiki, I.: Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 33(7), 2249–2261 (2021). https://doi.org/10.1007/s00521-020-05070-8
Younis, K.S.: Arabic hand-written character recognition based on deep convolutional neural networks. Jordanian J. Comput. Inf. Technol. 3(3), 7 (2017). https://doi.org/10.5455/jjcit.71-1498142206
Aggarwal, K., Mijwil, M.M., Al-Mistarehi, A.-H., Alomari, S., Gök, M., Alaabdin, A.M.Z., Abdulrhman, S.H.: Has the future started? The current growth of artificial intelligence, machine learning, and deep learning. Iraqi J. Comput. Sci. Math. 3(1), 115–123 (2022). https://doi.org/10.52866/ijcsm.2022.01.01.013
Alkhateeb, J.H.: An effective deep learning approach for improving off-line Arabic handwritten character recognition. Int. J. Softw. Eng. Comput. Syst. 6(2), 53–61 (2020). https://doi.org/10.15282/ijsecs.6.2.2020.7.0076
Ahmed, R., Gogate, M., Tahir, A., Dashtipour, K., Al-Tamimi, B., Hawalah, A., El-Affendi, M.A., Hussain, A.: Novel deep convolutional neural network-based contextual recognition of Arabic handwritten scripts. Entropy 23(3), 340 (2021). https://doi.org/10.3390/e23030340
Wagaa, N., Kallel, H., Mellouli, N., et al.: Improved Arabic alphabet characters classification using convolutional neural networks (CNN). Comput. Intell. Neurosci. (2022). https://doi.org/10.1155/2022/9965426
Nayef, B.H., Abdullah, S.N.H.S., Sulaiman, R., Alyasseri, Z.A.A.: Optimized leaky Relu for handwritten Arabic character recognition using convolution neural networks. Multimed. Tools Appl. (2022). https://doi.org/10.1007/s11042-021-11593-6
Masruroh, S.U., Syahid, M.F., Munthaha, F., Muharram, A.T., Putri, R.A.: Deep convolutional neural networks transfer learning comparison on Arabic handwriting recognition system. JOIV Int. J. Inf. Visual. 7(2), 330–337 (2023). https://doi.org/10.30630/joiv.7.2.1605
Elleuch, M., Maalej, R., Kherallah, M.: A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Comput. Sci. 80, 1712–1723 (2016). https://doi.org/10.1016/j.procs.2016.05.512
Lawgali, A., Angelova, M., Bouridane, A.: HACDB: Handwritten Arabic characters database for automatic character recognition. In: European Workshop on Visual Information Processing (EUVIP), pp. 255–259 (2013). https://api.semanticscholar.org/CorpusID:14053532
Pechwitz, M., Maddouri, S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT: database of handwritten Arabic words (2002). https://api.semanticscholar.org/CorpusID:15106190
Alrobah, N., Albahli, S.: A hybrid deep model for recognizing Arabic handwritten characters. IEEE Access 9, 87058–87069 (2021). https://doi.org/10.1109/ACCESS.2021.3087647
Khudeyer, R.S., Almoosawi, N.M.: Combination of machine learning algorithms and Resnet50 for Arabic handwritten classification. Informatica (2023). https://doi.org/10.31449/inf.v46i9.4375
Alwagdani, M.S., Jaha, E.S.: Deep learning-based child handwritten Arabic character recognition and handwriting discrimination. Sensors 23(15), 6774 (2023). https://doi.org/10.3390/s23156774
El Mamoun, M.: An effective combination of convolutional neural network and support vector machine classifier for Arabic handwritten recognition. Autom. Control. Comput. Sci. 57(3), 267–275 (2023). https://doi.org/10.3103/S0146411623030069
Sahlol, A.T., Suen, C.Y., Zawbaa, H.M., Hassanien, A.E., Abd Elfattah, M.: Bio-inspired bat optimization algorithm for handwritten Arabic characters recognition. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 1749–1756. IEEE (2016). https://doi.org/10.1109/CEC.2016.7744000
El-Mamoun, M., Mahmoud, Z., Kaddour, S.: SVM model selection using PSO for learning handwritten Arabic characters. Comput. Mater. Continua (2019). https://doi.org/10.32604/cmc.2019.08081
Rawat, S., Solomon, D.D., Kanwar, K., Garg, S., Kumar, K., Mijwil, M.M., Beňova, E.: Indian sign language recognition system for interrogative words using deep learning. In: International Conference on Advances in Communication Technology and Computer Engineering, pp. 383–397. Springer (2023). https://doi.org/10.1007/978-3-031-37164-6_29
Roy, S., Jain, A., Lal, S., Kini, J.: A study about color normalization methods for histopathology images. Micron 114, 42–61 (2018). https://doi.org/10.1016/j.micron.2018.07.005
Kamel, M., Zhao, A.: Extraction of binary character/graphics images from grayscale document images. CVGIP Graph. Models Image Process. 55(3), 203–217 (1993). https://doi.org/10.1006/cgip.1993.1015
Hou, Y.-C.: Visual cryptography for color images. Pattern Recogn. 36(7), 1619–1629 (2003). https://doi.org/10.1016/S0031-3203(02)00258-3
Chen, X., Hsieh, C.-J., Gong, B.: When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv:2106.01548 (2021)
Yu, S., Xie, L., Huang, Q.: Inception convolutional vision transformers for plant disease identification. Internet Things 21, 100650 (2023). https://doi.org/10.1016/j.iot.2022.100650
Dutta, P., Sathi, K.A., Hossain, M.A., Dewan, M.A.A.: Conv-ViT: A convolution and vision transformer-based hybrid feature extraction method for retinal disease detection. J Imaging 9(7), 140 (2023). https://doi.org/10.3390/jimaging9070140
Ma, X., Li, Z., Zhang, L.: An improved ResNet-50 for garbage image classification. Tehnički vjesnik 29(5), 1552–1559 (2022). https://doi.org/10.17559/TV-20220420124810
Balaha, H.M., Ali, H.A., Saraya, M., Badawy, M.: A new Arabic handwritten character recognition deep learning system (AHCR-DLS). Neural Comput. Appl. 33(11), 6325–6367 (2021). https://doi.org/10.1007/s00521-020-05397-2
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017). https://doi.org/10.1609/aaai.v31i1.11231
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/cvpr.2016.90
Shafiq, M., Gu, Z.: Deep residual learning for image recognition: a survey. Appl. Sci. 12(18), 8972 (2022). https://doi.org/10.3390/app12188972
Ronald, M., Poulose, A., Han, D.S.: iSPLInception: An inception-ResNet deep learning architecture for human activity recognition. IEEE Access 9, 68985–69001 (2021). https://doi.org/10.1109/ACCESS.2021.3078184
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures (2016). arXiv:1603.08029
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., : An image is worth \(16\times 16\) words: Transformers for image recognition at scale (2020). arXiv:2010.11929
Li, Y., Mao, H., Girshick, R., He, K.: Exploring plain vision transformer backbones for object detection. In: European Conference on Computer Vision, pp. 280–296. Springer (2022). https://doi.org/10.1007/978-3-031-20077-9_17
Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.-H., Ma, J.: V2x-ViT: Vehicle-to-everything cooperative perception with vision transformer. In: European Conference on Computer Vision, pp. 107–124. Springer (2022). https://doi.org/10.1007/978-3-031-19842-7_7
Li, S., Wu, D., Wu, F., Zang, Z., Sun, B., Li, H., Xie, X., Li, S.: Architecture-agnostic masked image modeling—from ViT back to CNN (2022). arXiv:2205.13943
Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., Liu, F.: ViT-YOLO: Transformer-based yolo for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2799–2808 (2021). https://doi.org/10.1109/ICCVW54120.2021.00314
Sarwinda, D., Bustamam, A., Paradisa, R.H., Argyadiva, T., Mangunwardoyo, W.: Analysis of deep feature extraction for colorectal cancer detection. In: 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), pp. 1–5. IEEE (2020). https://doi.org/10.1109/ICICoS51170.2020.9298990
Tagougui, N., Kherallah, M., Alimi, A.M.: Online Arabic handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. IJDAR 16, 209–226 (2013). https://doi.org/10.1007/s10032-012-0186-8
Wang, H., Yang, M., Stufken, J.: Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 114(525), 393–405 (2019). https://doi.org/10.1080/01621459.2017.1408468
Tahir, Y., Chenfour, N., Harti, M.: Modélisation à objets d’une base de données morphologique pour la langue arabe. JEP-TALN (2004)
Chicco, D., Tötsch, N., Jurman, G.: The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14(1), 1–22 (2021). https://doi.org/10.1186/s13040-021-00244-z
Markoulidakis, I., Kopsiaftis, G., Rallis, I., Georgoulas, I.: Multi-class confusion matrix reduction method and its application on net promoter score classification problem. In: The 14th Pervasive Technologies Related to Assistive Environments Conference, pp. 412–419 (2021). https://doi.org/10.1145/3453892.3461323
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors have participated in (a) conception and design, or analysis and interpretation of the data. (b) drafting the article or revising it critically for important intellectual content. (c) approval of the final version.
Corresponding author
Ethics declarations
Conflicts of interests
The interest statement does not apply to the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rouabhi, S., Azerine, A., Tlemsani, R. et al. Conv-ViT fusion for improved handwritten Arabic character classification. SIViP (2024). https://doi.org/10.1007/s11760-024-03158-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03158-5