1 Introduction

Segmentation is one of the most consequential phase in optical character recognition. Presence of cursiveness in Indian scripts makes the segmentation task much more harder [1]. Most segmentation algorithms for Indian scripts require some prior knowledge about the structure of a handwritten word to efficiently fragment the word into constituent characters. Zone detection is a considerably-used strategy for this purpose. Zone detection separates a word into three segments, namely upper, middle, and lower zone. The upper zone is detected by exploiting the headline, a special feature present in most Indian scripts. Sarkar et al. [2] have computed the headline in Bangla words by extracting horizontalness and verticalness features from the words. Roy et al. [3] have estimated headline in Bangla words using the height of the word, horizontal projection analysis and certain heuristics. Bag and Krishna [4] have used horizontal density row and local maximum row for detecting headlines in handwritten Hindi words. But, these methods suffer when the words are skewed or when the headline is discontinuous or mostly absent. Furthermore, there is an inadequacy of methodologies that are capable of handling multi-script in a document.

In the present work, we propose a method that uses simple linear regression for estimating headlines present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. It can be effectively used in word images extracted from a document comprising of multi headline-based script without any prior knowledge about the scripts. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent.

The rest of the paper is organized as follows. The proposed methodology is delineated in Sect. 2. In Sect. 3, the experimental results and analysis are discussed, followed by conclusion in Sect. 4.

Fig. 1.
figure 1

Horizontal line called \({m\bar{{ a}}tr\bar{{ a}}}\) in Bangla, \(shirorekh\bar{{ a}}\) in Devanagari script, and headline in Gurmukhi script.

2 Proposed Method

Most Indian scripts have a distinctive feature called headline (also known as \(m\bar{{ a}}tr\bar{{ a}}\) in Bangla and \(shirorekh\bar{{ a}}\) in Devanagari script) (Fig. 1) present in words. All the characters are connected by the headline at the upper portion of a word. This headline is sometimes discontinuous depending on the individuality of a person’s handwriting. We propose a strategy that utilises this distinctive feature to estimate headline present in word images. The method employed is very simple and effective.

2.1 Preprocessing

Initially we binarize all gray level word image (\( \tau _{k} \)) (Fig. 2a). We denote the binarized image as \( \nu (\tau _{k}) \). We use Rosenfeld and Kak component labelling algorithm to label all the connected components and calculate the size (w.r.t. total number of pixels) of each connected component in \( \nu (\tau _{k}) \). Next, we remove each connected component that appears on the top three-fourth of \( \nu (\tau _{k}) \) and constitutes pixels below a certain threshold, \( \rho \)(=30) as a noise normalization procedure. We have used 150 word images for the validation of the optimal value of \( \rho \).

Fig. 2.
figure 2

Stepwise illustration of headline estimation. (a) Input grayscale image; (b) After binarization and noise normalization; (c) First encountered foreground pixels on equi-distant columns marked with magenta colour; (d) After removal of ineligible pixels; (e) Estimated headline marked with cyan colour.

2.2 Headline Estimation

We take \( \nu (\tau _{k}) \) with dimension m \( \times \) n as input (Fig. 2b) and select q of the n columns in \( \nu (\tau _{k}) \) based on a predefined distance. These q selected columns are denoted as \(\mathcal {Q}=<c_1,c_2,\cdots ,c_q>\). The columns are selected at a distance of 12% of the width of \( \nu (\tau _{k}) \). We used a subset of 350 images to validate this optimal percentage. We detect and store the first encountered foreground pixel on each column in \(\mathcal {Q}\) while traversing \( \nu (\tau _{k}) \) from top to bottom. The stored foreground pixels are denoted as \(\mathcal {P}=<p_1,p_2,\cdots ,p_q>\) (Fig. 2c). Each stored pixel \(p_i\) is associated with a row and a column number denoted as \(p_i\)(r) and \(p_i\)(c) respectively. We use three sets namely, \(\mathcal {E} \), \(\mathcal {E'} \), and \(\mathcal {I} \) to represent the pixels in \(\mathcal {P} \) as eligible, ineligible, and intermediate respectively. \(\mathcal {E} \) represents pixels that are eligible for further headline estimation. \(\mathcal {E'}\) represents pixels that will be deleted from \(\mathcal {P} \) and will not be considered for further computation. \(\mathcal {I} \) represents pairs of pixels that are temporarily put here before checking their belongingness in \(\mathcal {E} \) or \(\mathcal {E'}\). For every three consecutive pixels \(p_i\), \(p_{i+1}\), and \(p_{i+2}\) in \(\mathcal {P}\), we evaluate the angle \(\angle p_{i}p_{i+1}p_{i+2}\) (denoted as \(\theta _{q}\)). If \(\theta _{q}\) \(<=\)165\(^{\circ }\), we conclude that any one of the three pixels is not a headline pixel. To determine which pixel among the three is not a headline pixel, we compute \(|p_i(r)-p_{i+1}(r)|\) and \(|p_{i+1}(r)-p_{i+2}(r)|\). The difference of column values of the pixels in \(\mathcal {P}\) will mostly be zero, as the columns are equi-distant from each other. So, we only take row values of pixels in \(\mathcal {P}\) into consideration for distance computation. We have used a subset of 150 word images to validate 165 as the optimal angle. If \(|p_i(r)-p_{i+1}(r)|> |p_{i+1}(r)-p_{i+2}(r)|\), then we conclude that either \(p_i\) or \(p_{i+1}\) is not a headline pixel and as a result we consider {\(p_i\), \(p_{i+1}\)} as intermediate pixel pair and store the pair in \(\mathcal {I} \). Otherwise, we conclude that either \(p_{i+1}\) or \(p_{i+2}\) is not a headline pixel and consider {\(p_{i+1}\), \(p_{i+2}\)} as intermediate pixel pair and store the pair in \(\mathcal {I} \) instead. If a pixel \(p_i\) in \(\mathcal {P} \) is considered twice as intermediate in a single iteration, then we conclude that \(p_i\) is not a headline pixel and transfer \(p_i\) from the set \(\mathcal {I} \) to \(\mathcal {E'} \), while the pixel paired with \(p_i\) in \(\mathcal {I} \) is removed from \(\mathcal {I} \). Once all the intermediate pixels are marked in a single iteration, we compute the eligible pixels in \(\mathcal {E}\) as \(\mathcal {E}\) = \(\mathcal {P}\) – (\(\mathcal {I}~\cup ~\mathcal {E'} \)).

For every pixel pair {\( p_i \), \( p_{i+1} \)} in \(\mathcal {I} \), we compute the row-wise difference, \(df_{p_i}\) and \(df_{p_{i+1}}\) of \( p_{i} \) and \( p_{i+1} \) with every pixel in \(\mathcal {E}\). We compute the maximum of the two differences \(df_{p_i}\) and \(df_{p_{i+1}}\) as \(max_{df}\). A non-headline pixel will always have a greater row difference with headline pixels than the difference between a headline pixel with other headline pixels. So, the pixel in the pixel pair {\( p_i \), \( p_{i+1} \)} that is associated with most number of \(max_{df}\) is transferred from \(\mathcal {I} \) to \(\mathcal {E'}\) while the other is transferred to \(\mathcal {E}\). Once all the pixel pairs in \(\mathcal {I} \) are checked, the pixels belonging to \(\mathcal {E'} \) are removed from \(\mathcal {P} \) (Fig. 2d). \(\mathcal {E} \), \(\mathcal {I} \), and \(\mathcal {E'} \) are all emptied. This procedure is carried out until no three consecutive pixel in \(\mathcal {P} \) creates an angle less than or equal to 165\(^{\circ }\). We remove ineligible pixels from \(\mathcal {P} \) to ensure that the headline estimation does not get affected due to the presence of upper modifiers and certain consonants that appear above the headline in a word. are some examples of upper modifier and consonant that appear above the headline in Bangla script. Example of headline estimation of words with such modifiers appearing in Indian scripts are shown in the next section.

Fig. 3.
figure 3

Stepwise illustration of ineligible pixel removal. (a) Angle \(\angle p_{1}p_{2}p_{3} <=\) 165\(^{\circ }\) and \(|p_{1}(r)-p_{2}(r)|> |p_{2}(r)-p_{3}(r)|\), so, {\(p_{1}\), \(p_{2}\)} is stored in \(\mathcal {I} \); (b) For the next three pixels, angle \(\angle p_{2}p_{3}p_{4} <= \) 165\(^{\circ }\) and \(|p_{2}(r)-p_{3}(r)|> |p_{3}(r)-p_{4}(r)|\), so, {\(p_{2}\), \(p_{3}\)} is stored in \(\mathcal {I} \), but two consecutive pixel pairs contain the same pixel, i.e., \(p_{2}\); (c) As \(p_{2}\) consecutively appears twice in \(\mathcal {I} \), so \(p_{2}\) is transferred to \(\mathcal {E'} \), while the two pixels associated with it, i.e., \(p_{1}\) and \(p_{3}\), are removed; (d) Pixels in \(\mathcal {E'} \) are removed from \(\mathcal {P}\).

We use the word as a working example to demonstrate the proposed methodology. Due to the presence of the consonant , the second pixel \(p_{2}\) in \(\mathcal {P} \) is marked much higher compared to the position of headline (Fig. 3a). As a result, for the first three pixels in \(\mathcal {P} \), angle \(\angle p_{1}p_{2}p_{3}\) is \(<=\) 165\(^{\circ }\) and \(|p_{1}(r)\) - \(p_{2}(r)|> |p_{2}(r)\) - \(p_{3}(r)|\). We conclude that either \(p_{1}\) or \(p_{2}\) is a non-headline pixel and store {\(p_{1}\), \(p_{2}\)} in \(\mathcal {I} \) as a pixel pair (Fig. 3a). Again, when we shift one pixel right and consider the next three pixels, angle \(\angle p_{2}p_{3}p_{4}\) becomes \(<=\) 165\(^{\circ }\) and \(|p_{2}(r)\) - \(p_{3}(r)|> |p_{3}(r)\) - \(p_{4}(r)|\). So, we conclude that either \(p_{2}\) or \(p_{3}\) is a non-headline pixel and we store {\(p_{2}\), \(p_{3}\)} in \(\mathcal {I} \) (Fig. 3b). As, \(p_{2}\) repeats in two consecutive pixel pairs in \(\mathcal {I} \), we infer that \(p_{2}\) is a non-headline pixel. As a result, we transfer \(p_{2}\) from \(\mathcal {I} \) to \(\mathcal {E'} \) and remove the pixels associated with \(p_{2}\), i.e., \(p_{1}\) and \(p_{3}\), from \(\mathcal {I} \) (Fig. 3c). We check all the remaining consecutive pixels in \(\mathcal {P} \). Once checking completes, we remove the ineligible pixels in \(\mathcal {E'} \) from \(\mathcal {P} \) (Fig. 3d).

Now, we predict the row values \(\hat{\mathcal {P}(r)}\) based on the row and column values of pixels in \(\mathcal {P} \) using the following equation:

$$\begin{aligned} \hat{\mathcal {P}(r)} = b_0 + b_{1} \times \mathcal {P}(c) \end{aligned}$$
(1)

where,

$$\begin{aligned} b_1 = \sum _{i=1}^{|\mathcal {P} |} \frac{(p_{i}(c) - \overline{p(c)})(p_{i}(r) - \overline{p(r)})}{(p_{i}(c) - \overline{p(c)})^{2}} \text {, } \end{aligned}$$
$$\begin{aligned} b_0 = \overline{p(r)} - b_1 \times \overline{p(c)} \text {, } \end{aligned}$$
$$\begin{aligned} \overline{p(c)} = \frac{\sum _{i=1}^{|\mathcal {P} |} p_{i}(c)}{|\mathcal {P} |} \text { and } \overline{p(r)} = \frac{\sum _{i=1}^{|\mathcal {P} |} p_{i}(r)}{|\mathcal {P} |}\text {.} \end{aligned}$$

We use the polyfit function in Matlab to employ these equations. Based on the \(\hat{\mathcal {P}(r)}\) and \(\mathcal {P}(c)\) values, we draw a regression line which gives the final estimated headline of each word (Fig. 2e).

3 Experimental Results and Analysis

3.1 Dataset

For experimentation, we have used four datasets for three different scripts, namely Bangla, Devanagari, and Gurmukhi. For Bangla script, we have used Cmaterdb dataset version 1.1.1 [5] and ICDAR 2013 Segmentation Dataset [6]. For Devanagari and Gurmukhi script, we have used Cmaterdb dataset version 1.5.1 [7] and PHDIndic_11 [8] dataset respectively. A total of 4050 words are used for our current experimentation. We have used Matlab for the entire implementation part.

Fig. 4.
figure 4

Test results on different Indian scripts. First column: Word images; Second column: First encountered foreground pixels on equi-distant columns marked with cyan colour; Third column: Eligible pixels are kept while ineligible pixels are discarded; Fourth column: Estimated headline marked with magenta colour.

3.2 Test Results and Comparative Analysis

We have delineated the experimental results and analysis of our proposed work in this section. Few outputs of our proposed technique are shown in Fig. 4. Last two rows of each script in Fig. 4 delineates the removal of ineligible pixels due to the presence of upper modifiers as discussed in previous section. A detailed analysis of the headline estimation performance achieved in each script is provided in Table 1. As per the tabulated results, Devanagari and Gurmukhi script provides the most and least precise result with an accuracy of 96.15% and 89.41% respectively. We achieved an overall accuracy of 92.59% when accuracy of all the 3 scripts are considered.

Table 1. Headline estimation accuracy achieved in different Indian scripts.
Table 2. Comparison of our proposed method with Sarkar et al. [2].
Table 3. Headline estimation comparison of few word images of our proposed method with Sarkar et al. [2]. Input for the last row is a synthetically oriented word image at 30\(^{\circ }\).

The efficiency of our proposed method is compared with Sarkar et al. [2]. This method utilises sum of length of horizontal runs, maximum horizontalness, horizontalness, and verticalness feature to identify the headline in handwritten Bangla words before segmentation is performed. This method is limited to handle non-skew words and also provides inaccurate result when the headline is mostly absent. Our proposed method is able to provide accurate result even when the headline is mostly absent and can handle skewed word images as well. A comparison of our proposed method with [2] has been provided in Table 2. We have also provided a visual comparison of few word images with [2] in Table 3 demonstrating that our proposed method provides more accurate headline estimation than [2].

4 Conclusion

Most segmentation algorithms require some prior knowledge about the location of the headline to swiftly and efficiently fragment a handwritten word into constituent characters in majority of Indian scripts. In the present work, we have proposed a method that uses simple linear regression for estimating headline present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent.