Abstract
We present our Offline Urdu Handwritten Text Dataset (UOHTD) in this paper by collecting 800 Urdu handwritten samples written by 800 native language writers. It consists of images in the form of a dataset containing written text samples scanned with multiple spatial resolutions. 8000 text lines and 40000 words as patches have been extracted from sample pages and checked manually and formally using a ground truth database. Machine Learning Tools have been utilized to extract sample pages and segment them into lines and words. Initial trials on demographic (gender and age group) classification of Urdu writers with samples of Offline Urdu Handwritten Text Dataset (UOHTD) has produced promising results (85% for gender and 79% for age group classification) using CNNs. The database would be made available to the researcher worldwide for study into various handwritten-related topics including text recognition, identification of the writer’s age, ethnicity, demographics, gender, and handedness, as well as verification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
He, S., Schomaker, L.: FragNet: writer identification using deep fragment networks. IEEE Trans. Inf. Forensics Secur. 15, 3013–3022 (2020)
Lee, C., Leedham, C.: A new hybrid approach to handwritten address verification. Int. J. Comput. Vis. 57, 107 (1994). https://doi.org/10.1023/B:VISI.0000013085.47268.e8
Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. 12(269), 3013–3022 (2009). https://doi.org/10.1007/s10032-009-0098-4
Latif, A., Rasheed, A., Sajid, U., et al.: Content-based image retrieval and feature extraction: a comprehensive review. Math. Probl. Eng. (2019). Article ID: 9658350
Ratyal, N., Taj, I.A., Sajid, M., et al.: Deeply learned pose invariant image analysis with applications in 3D face recognition. Math. Probl. Eng. (2019). Article ID: 3547416
Ali, H., Ullah, A., Iqbal, T., et al.: UHat SN. Appl. Sci. 2, 152 (2020). https://doi.org/10.1007/s42452-019-1914-1
Uddin, I., Javed, N., Siddiqi, I.A., Khalid, S., Khurshid, K.: Recognition of printed Urdu ligatures using convolutional neural networks. J. Electron. Imaging 28(3), 033004 (2019)
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Cha, S.-H., Srihari, S.N.: A priori algorithm for sub-category classification analysis of handwriting. In: Proceedings of Sixth International Conference on Document Analysis and Recognition. IEEE (2001)
Guru, D., Prakash, H.: Online signature verification and recognition: an approach based on symbolic representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1059–1073 (2009)
Sae-Bae, N., Memon, N.: Online signature verification on mobile devices. IEEE Trans. Inf. Forensics Secur. 9(6), 933–947 (2014)
Yang, W., Jin, L., Liu, M.: Character-level Chinese writer identification using path signature feature, dropstroke and deep CNN. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 546–550 (2015)
Yang, W., Jin, L., Liu, M.: DeepWriterID: an end-to-end online text-independent writer identification system. IEEE Intell. Syst. 31(2), 45–53 (2016)
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: 2010 20th International Conference on Pattern Recognition, pp. 1900–1903. IEEE (2010)
Ahmed, S.B., Naz, S., Swati, S., Razzak, I., Umar, A.I., Khan, A.A.: UCOM online dataset-an Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. (IAJIT) 14(2) (2017)
Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I.: Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput. Appl. 31(4), 1143–1151 (2019). https://doi.org/10.1007/s00521-017-3146-x
Ahsen Raza. https://sites.google.com/site/artificialtextdataset/home/atabipc-handwritten-dase-urdu-english
Papadatou-Pastou, M., Martin, M., Munafó, M.R., Jones, G.V.: Sex differences in left-handedness: a meta-analysis of 144 studies. Psychol. Bull. 134(5), 677 (2008)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Bouchain, D.: Character recognition using convolutional neural networks. Institute for Neural Information Processing (2006/2007)
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: A new large Urdu database for off-line handwriting recognition. In: Foggia, P., Sansone, C., Vento, M. (eds.) ICIAP 2009. LNCS, vol. 5716, pp. 538–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04146-4_58
Raza, A., et al.: An unconstrained benchmark Urdu handwritten sentence database with automatic line segmentation. In: 2012 International Conference on Frontiers in Handwriting Recognition. IEEE (2012)
Choudhary, P., Nain, N.: A four-tier annotated Urdu handwritten text image dataset for multidisciplinary research on Urdu script. ACM Trans. Asian Low-Resource Lang. Inf. Process. (TALLIP) 15(4), 1–23 (2016)
Morera, Á., Sánchez, Á., Vélez, J.F., Moreno, A.B.: Gender and handedness prediction from offline handwriting using convolutional neural networks. Complexity 2018 (2018)
Moetesum, M., Siddiqi, I., Djeddi, C., Hannad, Y., Al-Maadeed, S.: Data driven feature extraction for gender classification using multi-script handwritten texts. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 564–569. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rafique, A., Ishtiaq, M. (2022). UOHTD: Urdu Offline Handwritten Text Dataset. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-21648-0_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)