Abstract
Arabic handwritten documents consist of unstructured heterogeneous content. The information these documents can provide is very valuable both historically and educationally. However, content extraction from historical documents by Optical Character Recognition remains an open problem given the poor quality in writing. Furthermore, these documents most often show various forms of deterioration (e.g., watermarks). In this paper, we propose a Cycle GAN-based approach to generate a document with a readable font style from a historical Arabic handwritten document using a collection of unlabeled images. We used Arabic OCR for content extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Alghamdi, A., Alluhaybi, D., Almehmadi, D., Alameer, K., Siddeq, S.B., Alsubait, T.: Text segmentation of historical arabic handwritten manuscripts using projection profile. In: 2021 National Computing Colleges Conference (NCCC), pp. 1–6. IEEE (2021)
Almaksour, A., Mouchère, H., Anquetil, E.: Apprentissage incrémental et synthèse de données pour la reconnaissance de caractères manuscrits en-ligne. In: Colloque International Francophone sur l’Ecrit et le Document, pp. 55–60. Groupe de Recherche en Communication Ecrite (2008)
Doush, I.A., AIKhateeb, F., Gharibeh, A.H.: Yarmouk arabic ocr dataset. In: 2018 8th International Conference on Computer Science and Information Technology (CSIT), pp. 150–154. IEEE (2018)
Eltay, M., Zidouri, A., Ahmad, I., Elarian, Y.: Generative adversarial network based adaptive data augmentation for handwritten arabic text recognition. Peer J. Comput. Sci. 8, e861 (2022)
Fang, W., Zhang, F., Sheng, V.S., Ding, Y.: A method for improving cnn-based image recognition using dcgan. Comput., Mater. Contin. 57(1), 167–178 (2018)
Fernández Mota, D., Fornés Bisquerra, A.: Contextual word spotting in historical handwritten documents. Universitat Autònoma de Barcelona (2015)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Hassen, H., Al-Madeed, S., Bouridane, A.: Subword recognition in historical arabic documents using c-grus. TEM J. 10(4), 1630–1637 (2021)
Hsu, C.C., Lin, C.W., Su, W.T., Cheung, G.: Sigan: Siamese generative adversarial network for identity-preserving face hallucination. IEEE Trans. Image Process. 28(12), 6225–6236 (2019)
Joutel, G., Eglin, V., Emptoz, H.: Une nouvelle approche pour indexer les documents manuscrits anciens. In: Colloque International Francophone sur l’Ecrit et le Document. pp. 85–90. Groupe de Recherche en Communication Ecrite (2008)
Keinan-Schoonbaert, A., et al.: Ground truth transcriptions for training ocr of historical arabic handwritten texts. [“”] (2019)
Khedher, M.I., Jmila, H., El-Yacoubi, M.A.: Automatic processing of historical arabic documents: a comprehensive survey. Pattern Recognit. 100, 107144 (2020)
Lei, Y., Harms, J., Wang, T., Liu, Y., Shu, H.K., Jani, A.B., Curran, W.J., Mao, H., Liu, T., Yang, X.: Mri-only based synthetic ct generation using dense cycle consistent generative adversarial networks. Med. Phys. 46(8), 3565–3581 (2019)
Liu, X., Meng, G., Xiang, S., Pan, C.: Handwritten text generation via disentangled representations. IEEE Signal Process Lett. 28, 1838–1842 (2021)
Memon, J., Sami, M., Khan, R.A., Uddin, M.: Handwritten optical character recognition (ocr): a comprehensive systematic literature review (slr). IEEE Access 8, 142642–142668 (2020)
Montreuil, F., Nicolas, S., Heutte, L., Grosicki, E.: Intégration d’informations textuelles de haut niveau en analyse de structures de documents manuscrits non contraints. Document Numerique 14(2), 77–101 (2011)
Pang, Y., Liu, Y.: Conditional generative adversarial networks (cgan) for aircraft trajectory prediction considering weather effects. In: AIAA Scitech 2020 Forum, p. 1853 (2020)
Perée, T., et al.: Implémentation d’un système d’imagerie multispectrale adapté au phénotypage de cultures en conditions extérieures et comparaison de deux méthodes de normalisation d’images (2019)
Pérez-García, F., Sparks, R., Ourselin, S.: Torchio: a python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 208, 106236 (2021)
Souibgui, M.A., Kessentini, Y.: De-gan: a conditional generative adversarial network for document enhancement. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Vögtlin, L., Drazyk, M., Pondenkandath, V., Alberti, M., Ingold, R.: Generating synthetic handwritten historical documents with ocr constrained gans. In: International Conference on Document Analysis and Recognition, pp. 610–625. Springer (2021)
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional gan. Adv. Neural Inf. Process. Syst. 32 (2019)
Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)
Zhao, X., Yuan, Y., Song, M., Ding, Y., Lin, F., Liang, D., Zhang, D.: Use of unmanned aerial vehicle imagery and deep learning unet to extract rice lodging. Sensors 19(18), 3859 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Erromh, M.A., Nakouri, H., Boukhris, I. (2023). GAN Based Restyling of Arabic Handwritten Historical Documents. In: Abraham, A., Hong, TP., Kotecha, K., Ma, K., Manghirmalani Mishra, P., Gandhi, N. (eds) Hybrid Intelligent Systems. HIS 2022. Lecture Notes in Networks and Systems, vol 647. Springer, Cham. https://doi.org/10.1007/978-3-031-27409-1_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-27409-1_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27408-4
Online ISBN: 978-3-031-27409-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)