Abstract
In this work, we propose a texture-based approach to separate handwritten from machine-printed words, written in Arabic and Latin scripts. The idea is to benefit from differences in writing orientation and the difference between the stroke length to discriminate between these scripts. For that, we designed a K nearest neighbors classifier trained with a set of texture features. These features are extracted from black run-length (BRL) histograms and seem to be suitable for finding structural characteristics in word images. Four feature extraction scenarios: (1) BRL, (2) restricted BRL, (3) BRL statistics and (4) restricted BRL combined to their statistics are chosen to demonstrate the potential of such a texture-based approach in script identification. Exploiting these features, we have got very promising result. The identification correct rate is higher than 98.92 % in our experiments.
Similar content being viewed by others
References
Saïdani A, Kacem A, Belaïd A (2013) Identification of machine-printed and handwritten words in Arabic and Latin scripts. In: Proceedings of ICDAR, pp 798–802
Saïdani A, Kacem A (2014) Pyramid histogram of oriented gradient for machine-printed/handwritten and Arabic/Latin word discrimination. In: Proceedings of SoCPaR, pp 267–272
Kacem A, Saïdani A, Belaïd A (2014) How to separate between machine-printed/handwritten and Arabic/Latin words? ELCVIA 13(1):1–16
Saïdani A, Kacem A, Belaïd A (2015) Co-occurrence matrix of oriented gradients for word script and nature identification. To be appeared in proceedings of ICDAR
Benjelil M, Mullot R, Alimi A (2012) Language and script identification based on Steerable Pyramid Features. In: Proceedings of ICFHR, 18–20 September, Bary-Italy, pp 712–717
Haboubi S, Maddouri S, Amiri H (2011) Discrimination between Arabic and Latin from bilingual documents. In: Proceedings of CCCA
Mozaffari S, Bahar P (2012) Farsi/Arabic handwritten from machine-printed words discrimination. In: Proceedings of ICFHR, Italy, pp 694–699
Mezghani A, Slimane F, Kanoun S, Märgner V (2014) Identification of Arabic/French-handwritten/printed words using GMM-based system. In: Proceedings of CIFED, pp 371–374
Benjelil M, Mullot R (2014) Performance of curvelets, dual-tree complex wavelet and discrete wavelet transform in handwritten word classification. In: Proceedings of SoCPaR, pp 53–58
Marti U, Bunke H (1999) A full english sentence database for off-line handwriting recognition. In: Proceedings of ICDAR, pp 705–708
Margner V, Ellouze N, Amiri H, Pechwitz M, Snoussi S (2002) Maddouri, IFN/ENIT—database of handwritten Arabic words. In: Proceedings of CIFED, pp 129–136
Mezghani A, Kanoun S, Khemakhem M, El Abed H (2012) A Database for Arabic handwritten text image recognition and writer identification. In: Proceedings of ICFHR, pp 399–402
Slimane F, Ingold R, Kanoun S, Alimi A, Hennebert J (2009) A new Arabic printed text image database and evaluation protocols. In: Proceedings of ICDAR, pp 946–950
Ladha L, Deepa T (2011) Feature selection methods and algorithms. Int J Comput Sci Eng 3(5):1787–1797
Galloway MM (1975) Texture analysis using gray level run lengths. Comput Graph Image Process 4(2):172–179
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kacem, A., Saïdani, A. A texture-based approach for word script and nature identification. Pattern Anal Applic 20, 1157–1167 (2017). https://doi.org/10.1007/s10044-016-0555-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-016-0555-x