Abstract
Most segmentation algorithms for Indian scripts require some prior knowledge about the structure of a handwritten word to efficiently fragment the word into constituent characters. Zone detection is a considerably used strategy for this purpose. Headline estimation is a salient part of zone detection. In the present work, we propose a method that uses simple linear regression for estimating headlines present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent. We have compared our method with a recent work to show the efficacy of our proposed methodology.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Segmentation is one of the most consequential phase in optical character recognition. Presence of cursiveness in Indian scripts makes the segmentation task much more harder [1]. Most segmentation algorithms for Indian scripts require some prior knowledge about the structure of a handwritten word to efficiently fragment the word into constituent characters. Zone detection is a considerably-used strategy for this purpose. Zone detection separates a word into three segments, namely upper, middle, and lower zone. The upper zone is detected by exploiting the headline, a special feature present in most Indian scripts. Sarkar et al. [2] have computed the headline in Bangla words by extracting horizontalness and verticalness features from the words. Roy et al. [3] have estimated headline in Bangla words using the height of the word, horizontal projection analysis and certain heuristics. Bag and Krishna [4] have used horizontal density row and local maximum row for detecting headlines in handwritten Hindi words. But, these methods suffer when the words are skewed or when the headline is discontinuous or mostly absent. Furthermore, there is an inadequacy of methodologies that are capable of handling multi-script in a document.
In the present work, we propose a method that uses simple linear regression for estimating headlines present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. It can be effectively used in word images extracted from a document comprising of multi headline-based script without any prior knowledge about the scripts. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent.
The rest of the paper is organized as follows. The proposed methodology is delineated in Sect. 2. In Sect. 3, the experimental results and analysis are discussed, followed by conclusion in Sect. 4.
2 Proposed Method
Most Indian scripts have a distinctive feature called headline (also known as \(m\bar{{ a}}tr\bar{{ a}}\) in Bangla and \(shirorekh\bar{{ a}}\) in Devanagari script) (Fig. 1) present in words. All the characters are connected by the headline at the upper portion of a word. This headline is sometimes discontinuous depending on the individuality of a person’s handwriting. We propose a strategy that utilises this distinctive feature to estimate headline present in word images. The method employed is very simple and effective.
2.1 Preprocessing
Initially we binarize all gray level word image (\( \tau _{k} \)) (Fig. 2a). We denote the binarized image as \( \nu (\tau _{k}) \). We use Rosenfeld and Kak component labelling algorithm to label all the connected components and calculate the size (w.r.t. total number of pixels) of each connected component in \( \nu (\tau _{k}) \). Next, we remove each connected component that appears on the top three-fourth of \( \nu (\tau _{k}) \) and constitutes pixels below a certain threshold, \( \rho \)(=30) as a noise normalization procedure. We have used 150 word images for the validation of the optimal value of \( \rho \).
2.2 Headline Estimation
We take \( \nu (\tau _{k}) \) with dimension m \( \times \) n as input (Fig. 2b) and select q of the n columns in \( \nu (\tau _{k}) \) based on a predefined distance. These q selected columns are denoted as \(\mathcal {Q}=<c_1,c_2,\cdots ,c_q>\). The columns are selected at a distance of 12% of the width of \( \nu (\tau _{k}) \). We used a subset of 350 images to validate this optimal percentage. We detect and store the first encountered foreground pixel on each column in \(\mathcal {Q}\) while traversing \( \nu (\tau _{k}) \) from top to bottom. The stored foreground pixels are denoted as \(\mathcal {P}=<p_1,p_2,\cdots ,p_q>\) (Fig. 2c). Each stored pixel \(p_i\) is associated with a row and a column number denoted as \(p_i\)(r) and \(p_i\)(c) respectively. We use three sets namely, \(\mathcal {E} \), \(\mathcal {E'} \), and \(\mathcal {I} \) to represent the pixels in \(\mathcal {P} \) as eligible, ineligible, and intermediate respectively. \(\mathcal {E} \) represents pixels that are eligible for further headline estimation. \(\mathcal {E'}\) represents pixels that will be deleted from \(\mathcal {P} \) and will not be considered for further computation. \(\mathcal {I} \) represents pairs of pixels that are temporarily put here before checking their belongingness in \(\mathcal {E} \) or \(\mathcal {E'}\). For every three consecutive pixels \(p_i\), \(p_{i+1}\), and \(p_{i+2}\) in \(\mathcal {P}\), we evaluate the angle \(\angle p_{i}p_{i+1}p_{i+2}\) (denoted as \(\theta _{q}\)). If \(\theta _{q}\) \(<=\)165\(^{\circ }\), we conclude that any one of the three pixels is not a headline pixel. To determine which pixel among the three is not a headline pixel, we compute \(|p_i(r)-p_{i+1}(r)|\) and \(|p_{i+1}(r)-p_{i+2}(r)|\). The difference of column values of the pixels in \(\mathcal {P}\) will mostly be zero, as the columns are equi-distant from each other. So, we only take row values of pixels in \(\mathcal {P}\) into consideration for distance computation. We have used a subset of 150 word images to validate 165 as the optimal angle. If \(|p_i(r)-p_{i+1}(r)|> |p_{i+1}(r)-p_{i+2}(r)|\), then we conclude that either \(p_i\) or \(p_{i+1}\) is not a headline pixel and as a result we consider {\(p_i\), \(p_{i+1}\)} as intermediate pixel pair and store the pair in \(\mathcal {I} \). Otherwise, we conclude that either \(p_{i+1}\) or \(p_{i+2}\) is not a headline pixel and consider {\(p_{i+1}\), \(p_{i+2}\)} as intermediate pixel pair and store the pair in \(\mathcal {I} \) instead. If a pixel \(p_i\) in \(\mathcal {P} \) is considered twice as intermediate in a single iteration, then we conclude that \(p_i\) is not a headline pixel and transfer \(p_i\) from the set \(\mathcal {I} \) to \(\mathcal {E'} \), while the pixel paired with \(p_i\) in \(\mathcal {I} \) is removed from \(\mathcal {I} \). Once all the intermediate pixels are marked in a single iteration, we compute the eligible pixels in \(\mathcal {E}\) as \(\mathcal {E}\) = \(\mathcal {P}\) – (\(\mathcal {I}~\cup ~\mathcal {E'} \)).
For every pixel pair {\( p_i \), \( p_{i+1} \)} in \(\mathcal {I} \), we compute the row-wise difference, \(df_{p_i}\) and \(df_{p_{i+1}}\) of \( p_{i} \) and \( p_{i+1} \) with every pixel in \(\mathcal {E}\). We compute the maximum of the two differences \(df_{p_i}\) and \(df_{p_{i+1}}\) as \(max_{df}\). A non-headline pixel will always have a greater row difference with headline pixels than the difference between a headline pixel with other headline pixels. So, the pixel in the pixel pair {\( p_i \), \( p_{i+1} \)} that is associated with most number of \(max_{df}\) is transferred from \(\mathcal {I} \) to \(\mathcal {E'}\) while the other is transferred to \(\mathcal {E}\). Once all the pixel pairs in \(\mathcal {I} \) are checked, the pixels belonging to \(\mathcal {E'} \) are removed from \(\mathcal {P} \) (Fig. 2d). \(\mathcal {E} \), \(\mathcal {I} \), and \(\mathcal {E'} \) are all emptied. This procedure is carried out until no three consecutive pixel in \(\mathcal {P} \) creates an angle less than or equal to 165\(^{\circ }\). We remove ineligible pixels from \(\mathcal {P} \) to ensure that the headline estimation does not get affected due to the presence of upper modifiers and certain consonants that appear above the headline in a word. are some examples of upper modifier and consonant that appear above the headline in Bangla script. Example of headline estimation of words with such modifiers appearing in Indian scripts are shown in the next section.
We use the word as a working example to demonstrate the proposed methodology. Due to the presence of the consonant , the second pixel \(p_{2}\) in \(\mathcal {P} \) is marked much higher compared to the position of headline (Fig. 3a). As a result, for the first three pixels in \(\mathcal {P} \), angle \(\angle p_{1}p_{2}p_{3}\) is \(<=\) 165\(^{\circ }\) and \(|p_{1}(r)\) - \(p_{2}(r)|> |p_{2}(r)\) - \(p_{3}(r)|\). We conclude that either \(p_{1}\) or \(p_{2}\) is a non-headline pixel and store {\(p_{1}\), \(p_{2}\)} in \(\mathcal {I} \) as a pixel pair (Fig. 3a). Again, when we shift one pixel right and consider the next three pixels, angle \(\angle p_{2}p_{3}p_{4}\) becomes \(<=\) 165\(^{\circ }\) and \(|p_{2}(r)\) - \(p_{3}(r)|> |p_{3}(r)\) - \(p_{4}(r)|\). So, we conclude that either \(p_{2}\) or \(p_{3}\) is a non-headline pixel and we store {\(p_{2}\), \(p_{3}\)} in \(\mathcal {I} \) (Fig. 3b). As, \(p_{2}\) repeats in two consecutive pixel pairs in \(\mathcal {I} \), we infer that \(p_{2}\) is a non-headline pixel. As a result, we transfer \(p_{2}\) from \(\mathcal {I} \) to \(\mathcal {E'} \) and remove the pixels associated with \(p_{2}\), i.e., \(p_{1}\) and \(p_{3}\), from \(\mathcal {I} \) (Fig. 3c). We check all the remaining consecutive pixels in \(\mathcal {P} \). Once checking completes, we remove the ineligible pixels in \(\mathcal {E'} \) from \(\mathcal {P} \) (Fig. 3d).
Now, we predict the row values \(\hat{\mathcal {P}(r)}\) based on the row and column values of pixels in \(\mathcal {P} \) using the following equation:
where,
We use the polyfit function in Matlab to employ these equations. Based on the \(\hat{\mathcal {P}(r)}\) and \(\mathcal {P}(c)\) values, we draw a regression line which gives the final estimated headline of each word (Fig. 2e).
3 Experimental Results and Analysis
3.1 Dataset
For experimentation, we have used four datasets for three different scripts, namely Bangla, Devanagari, and Gurmukhi. For Bangla script, we have used Cmaterdb dataset version 1.1.1 [5] and ICDAR 2013 Segmentation Dataset [6]. For Devanagari and Gurmukhi script, we have used Cmaterdb dataset version 1.5.1 [7] and PHDIndic_11 [8] dataset respectively. A total of 4050 words are used for our current experimentation. We have used Matlab for the entire implementation part.
3.2 Test Results and Comparative Analysis
We have delineated the experimental results and analysis of our proposed work in this section. Few outputs of our proposed technique are shown in Fig. 4. Last two rows of each script in Fig. 4 delineates the removal of ineligible pixels due to the presence of upper modifiers as discussed in previous section. A detailed analysis of the headline estimation performance achieved in each script is provided in Table 1. As per the tabulated results, Devanagari and Gurmukhi script provides the most and least precise result with an accuracy of 96.15% and 89.41% respectively. We achieved an overall accuracy of 92.59% when accuracy of all the 3 scripts are considered.
The efficiency of our proposed method is compared with Sarkar et al. [2]. This method utilises sum of length of horizontal runs, maximum horizontalness, horizontalness, and verticalness feature to identify the headline in handwritten Bangla words before segmentation is performed. This method is limited to handle non-skew words and also provides inaccurate result when the headline is mostly absent. Our proposed method is able to provide accurate result even when the headline is mostly absent and can handle skewed word images as well. A comparison of our proposed method with [2] has been provided in Table 2. We have also provided a visual comparison of few word images with [2] in Table 3 demonstrating that our proposed method provides more accurate headline estimation than [2].
4 Conclusion
Most segmentation algorithms require some prior knowledge about the location of the headline to swiftly and efficiently fragment a handwritten word into constituent characters in majority of Indian scripts. In the present work, we have proposed a method that uses simple linear regression for estimating headline present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent.
References
Bag, S., Harit, G.: A survey on optical character recognition for Bangla and Devanagari scripts. Sadhana 38(1), 133–168 (2013)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: A two-stage approach for Segmentation of Handwritten Bangla word Images. In: Proceedings of ICFHR, pp. 403–408 (2008)
Roy, P.P., Dey, P., Roy, S., Pal, U., Kimura, F.: A novel approach of Bangla handwritten text recognition using HMM. In: Proceedings of ICFHR, pp. 661–666 (2014)
Bag, S., Krishna, A.: Character segmentation of hindi unconstrained handwritten words. In: Barneva, R.P., Bhattacharya, B.B., Brimkov, V.E. (eds.) IWCIA 2015. LNCS, vol. 9448, pp. 247–260. Springer, Cham (2015). doi:10.1007/978-3-319-26145-4_18
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and BanglaEnglish mixed script document image. IJDAR 15(1), 71–83 (2012). Accessed 8 Feb 2017
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: ICDAR 2013 handwriting segmentation contest. In: Proceedings of ICDAR, pp. 1402–1406 (2013). Accessed 12 Mar 2017
CMATERdb 1.5.1: http://archive.is/xDqG6#selection-621.0-623.41. Accessed 2 Jan 2017
Das, N., Halder, C., Obaidullah, S.M., Roy, K., Santosh, K.C.: PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. In: Multimedia Tools and Applications, pp. 1–36 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pramanik, R., Bag, S. (2017). Linear Curve Fitting-Based Headline Estimation in Handwritten Words for Indian Scripts. In: Shankar, B., Ghosh, K., Mandal, D., Ray, S., Zhang, D., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2017. Lecture Notes in Computer Science(), vol 10597. Springer, Cham. https://doi.org/10.1007/978-3-319-69900-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-69900-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69899-1
Online ISBN: 978-3-319-69900-4
eBook Packages: Computer ScienceComputer Science (R0)