I. Introduction

A large volume of research effort has been dedicated to OCR systems. Numbers of algorithms [16] are available for this purpose, and many commercial OCR systems [7, 8] are now available in the market but most of these systems can recognize only text images having straight text-lines (horizontal) and designed only for a specific script or language. On the other hand, there are few limitations in regard to the source materials and character formatting which make feature extraction and recognition difficult.

Increasing demand for stylistic text recognition attracts researchers for the designing and development of new algorithms that can handle such type of text due to usage in many potential applications including container identification mark recognition system [9], vehicle license plate recognition [10], text recognition in video and images [1114], image retrieval [15] from the database, intelligent transport systems [16], robotics [17], and text translation service for tourist assistance [18], for foreigners with language barrier, etc. Stylistics text as shown in Figure 1 can be found on greeting cards, front cover of books, organization's logos, stamp seals, maps, engineering drawings, advertisements, newspaper's front page, hallmark cards, signboards, hording boards, archives, etc., containing multi-oriented, multi-font size, and curved text-lines or words resulting in errors in the recognition process. There may be two ways for the recognition of such documents: one is to develop algorithms those can recognize words and text-lines in actual format (curved etc.), and second is to convert curved text-lines into straight text-lines, preferably oriented horizontally. Further, the straight text-lines must be segmented into words or characters before going for recognition process. The main advantage of proposed approach is that no specific feature extraction and classification techniques or dataset is required in the recognition of such documents. So, in view of this, text normalization plays important role in the recognition of stylistic text.

Figure 1
figure 1

Sample of curved text-lines.

II. Related works

Many pieces of works are available on the document image recognition [2, 5, 6] having straight text-lines and words. But in the literature, there are only few works available towards the recognition of stylistic documents [1944].

In 1991, Xie et al. [19] proposed a pattern recognition system invariant of translation, scale change, and rotation transformation of pattern and 97% recognition accuracy is obtained from the 10 Arabic numerals. Another work on English stylistic text recognition due to Tang et al. [20] used a translation-ring-projection algorithm to handle the multi-oriented English alphabets in 1991.

In 2000, one work on English stylistic text recognition is due to Adam et al. [21] in which an approach of recognition of multi-oriented and multi-scaled character in engineering drawings is proposed. Fourier-Mellin transform is used to recognize the characters. This approach is limited to the recognition of few characters and it is time consuming. Further in 2001, Yang et al. [22] proposed an approach of three stages for multi-oriented Chinese character recognition where features are mainly based on geometric measures of the foreground pixels of the characters. It is limited to Chinese language only.

In 2003, Hase et al. [23] proposed a multi-oriented character handling approach based on the character types such as inclined, horizontal, vertical, curved, etc., and it considers character realignment horizontally and then for recognition. The main drawback of Hase et al. [23] approach is the distortion due to realignment of curved text. For rotated and/or inclined English character recognition, Hase et al. [23] used a parametric Eigen-space-based approach. This method is also limited to English character recognition but not used in variation of font style, size, and multi-script.

In 2005, Pal et al. [24] proposed a recognition-based approach to handle Indian multi-oriented and curved text. It is based on the water reservoir concept for segmentation of characters from stylistic documents without any skew correction. Next, individual characters are recognized. This approach is limited to recognize Bangla and Devanagari script text only. In 2006, Hayashi et al. [25] proposed a rotation invariant Arabic numerals recognition system where a numeral is divided into elementary sub-patterns like straight line, C-shaped line, and 0-shaped line using thinning algorithm, and then recognized based on different features like curvature, angle information, length, arc-length, etc. of the sub-pattern. In 2006, Pal et al. [26] proposed a method towards the recognition of multi-oriented and multi-sized English characters based on the modified quadratic discriminate function (MQDF). The main drawback of the proposed approach is that it cannot distinguish similar looking character such as ‘b’ and ‘q’ , ‘p’ and ‘d’ , ‘n’ and ‘u’ , etc. This is because of the use of rotational invariant features.

In 2007, Monwar et al. [27] proposed an approach of recognizing printed alpha-numeric character of different angle and in this approach, each character is described by a small set of two-dimensional characteristic views of different angles for feature extraction. In 2008, Roy et al. [28] proposed an approach towards recognition of English character in graphical documents containing multi-scale and multi-oriented text. For recognition of such multi-scale and multi-oriented characters, a support vector machine (SVM)-based scheme is presented.

In 2010, Pal et al. [29] proposed an approach for the recognition of multi-oriented Bangla and Devnagari characters. Although it is also a recognition-based system, this system fails in confusing characters of Bangla and Devnagari script.

In 2011, Chiang et al. [30] proposed a general text recognition technique to handle non-homogeneous text by exploiting dynamic character grouping criteria based on the character sizes and maximum desired string curvature. In 2011, Shivakumara et al. [31] proposed an approach to detect multi-oriented text in videos. The input image is first filtered with Fourier-Laplacian, and K-means clustering is then used to identify candidate text regions based on the maximum difference. The skeleton of each connected component helps to separate the different text strings from each other. Finally, text string straightness and edge density are used for false-positive elimination. Shivakumara et al. [31] method is limited to English language text orientation.

All the approaches discussed earlier based on the recognition of multi-oriented characters and are limited to proper script recognition. There is not a single approach in the literature which is based on the straightening of curved text-lines or words and script independence. In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines in a single document.

III. Proposed approach

In this work, the images to be processed are captured by a scanner and camera. For the thresholding of text images, Gatos [32] method is used. Figure 2 shows a sample of curved text-line image.

Figure 2
figure 2

Curved text-line.

Curved or stylistic text present in document images poses problems in segmentation and recognition. So, before recognition, it is important to straighten the text-lines. This paper attempts to present a method, which is based on the following important steps:

Step 1

Apply morphological dilation operation [33] on the text document using a disk-based structuring element with radius 17 so that region grows in the form of circular bubbles until all foreground objects (black pixels) are covered within a certain boundary. The selection of radius size of structuring element is a manual parameter in our work. It can vary from images to images, but on the basis of experiments, 17 is the most appropriate. Figure 3 shows the curved text image after dilation operation.

Figure 3
figure 3

The result after dilation using a circular structuring element of radius 17.

Step 2

For the approximation of black region of the curved text-line, we need pixels in a certain order that follows a path coinciding with the curve. This is accomplished by applying thinning to convert the region to one-pixel-thick lines. Since general thinning method results in many unwanted edges along the curve, as shown in Figure 4a, thinning method given in [34] is applied to overcome the shortcomings. Figure 4b shows the result of the smoothed thinning. Thinning is based on the following steps:

Figure 4
figure 4

Thinning. (a) Thinning with unwanted edges. (b) Smoothed thinning.

  • The image is inverted, ‘1’ as black and ‘0’ as white for the MATLAB perspective.

  • Erosion is applied on dilated image using disk-based structuring element of size ‘17’.

  • Morphological operation of thinning is applied on eroded image with ‘10’ iterations.

  • The output image is again inverted for the better working of further steps of curve straightening.

Step 3

In this step, curve fitting is applied on the result obtained from the previous step. For the curve fitting, B-spline [3541] curve is used instead of polynomial curve to approximate the pixel data. Polynomial curve fails to approximate complex curve as shown in Figure 2. B-spline and Bezier curves have very similar form but Bezier curve contains more information. Approximating simple curve using polynomial curve suffers with Runge’s phenomenon [4244]. So, when we approximate the higher-degree pixel data, the accuracy does not always increase. Hence, in the proposed approach, B-spline curve is used, which is free from Runge's phenomenon even at higher degrees. B-spline curve is described as follows:

Given n + 1 control points P0, P1,………, P n and a knot vector U = {u0, u1,……… u m }, the B-spline curve of degree p defined by these control points and knot vector u is

C u = i = 0 n N i , p u P i
(1)

Where N i , p (u)s are B-spline basis functions of degree p.

The degree of B-spline basis functions is defined by p. The i th B-spline basis function of degree p, written as N i , p (u), is defined recursively as follows:

N i , 0 u = 1 0 if u i u < u i + 1 otherwise
(2)
N i , p u = u u i u i + p u i N i , p 1 u + u i + p + 1 u u i + p + 1 u i + 1 N i + 1 , p 1 u
(3)

If we have n + 1 data points D0, D2,…………, D n and want to find a B-Spline curve that can follow the shape of the data polygon without actually containing the data points, then there is need of two more inputs. The first input is the number of control points and second input is degree (p), where n > h ≥ p ≥ 1. With these two inputs, a set of parameter and a knot vector can be determined. Let the parameter be t0, t1,…, t n . The numbers of data points are equal to the number of parameters. Now, the approximation B-spline of degree p is given by:

C u = i = 0 h N i , p u P i
(4)

Where P0, P1, ……, P h are the h + 1 unknown control points.

After passing the first and last data points to curve, D0 = C(0) = P0 and D n  = C(1) = P h , there are only h − 1 unknown control points. Taking this into consideration, the curve equation becomes the following:

C u = N 0 , p u D 0 + i h 1 N i , p u P i + N h , p u D n
(5)

To find out control points P1,… Ph−1 such that the function f (P1,….,Ph−1) is minimized. The approximation is done using least square method. The sum of all squared distances is

f P 1 , . , P h 1 = k = 1 n 1 D k C t k 2
(6)
k = 1 n 1 N g , p t k i = 1 h 1 N i , p t k P i = k = 1 n 1 N g , p t k Q k
(7)

Since we have h − 1 variables, g runs from 1 to h − 1 and there are h − 1 such equations:

P= P 1 P 2 . . . P h 1
(8)
Q= k = 1 n 1 N 1 , p t k Q k k = 1 n 1 N 2 , p t k Q k . . . k = 1 n 1 N h 1 , p t k Q k
(9)
N= N 1 , P t 1 N 2 , p t 1 N h 1 , p t 1 N 1 , p t 2 N 2 , p t 2 N h 1 , p t 2 N 1 , p t n 1 N 2 , p t n 1 N h 1 , p t n 1
(10)

The system of linear equation can be rewritten as

N T N P=Q
(11)
P= N T N 1 Q
(12)

Since N and Q are known, solving this system of linear equations for P gives the desired control points.

Step 4

Stylistic documents may have more than one curved or multi-oriented text-lines as shown in Figure 1. Before the straightening of text-lines, it is important to extract individual text-lines first. Multi-text-lines segmentation is dealt in the following manner:A. Dilation is applied on each of the text-line to cover all the foreground pixels within the black boundary. Figure 5 shows the dilation on the curved multi-text-lines using a circular structuring element of radius 17. The size of structuring element is chosen on the basis of experiments performed in this work.B. Modified flood fill algorithm is applied to find out the number of different text-lines in the image and to separate them from the input image. The steps 1 to 4 mentioned earlier are applied for the straightening of the image obtained after segmentation process. Modified flood fill algorithm marks the pixels of a region such that all the pixels of same region got the same numbering. It makes a blank image for each region and copies all the pixels of image (original undiluted image) to the corresponding image of the regions. Hence, separate images are obtained for separate text-lines. The segmented images are shown in Figure 6.

Figure 5
figure 5

Dilation on the text-lines.

Figure 6
figure 6

Segmented English text-lines with different font styles. (a) Text-line 1. (b) Text-line 1. (c) Text-line 3.

However, segmentation step produces the following errors in the extraction of individual text-line from multi-line images due to the following problems arising during the morphological operations such as:

  • If the inter-character spacing is large in stylistic text-lines. Although, it is assumed that the text image to be processed is based on the isolated character recognition. So, if inter-character spacing is large, the recognition rate will not be more affected.

  • When two or more text-lines are very close to each other, it will result in merged text-lines. In our experiments, we have taken only those images which contain sufficient gap between text-lines due to the limitation of morphology.

The following pseudo code outlines the modified flood fill algorithm.

In this step, the least square method [45] is used to approximate the pixel data into B-spline curve. Due to the discrete domain of the sample space, the output image may suffer from aliasing effects as shown in Figures 7 and 8a. This will cause serrated character edges in the output image as shown in Figure 8a. To remove the artifact produced by aliasing, anti-aliased pixels are used with bilinear interpolation [33] as shown in Figure 8b. Figure 9 shows the final smoothed output of the proposed approach of straightening of the curved text-line.

Figure 7
figure 7

Final output of curved text-line straightening approach.

Steps 1 to 4 are applied on the text-lines obtained after modified flood fill technique, and the results obtained are shown in the Figure 10.

Figure 8
figure 8

Sample with jaggies and after applying anti-aliasing. (a) Output with jaggies. (b) Smoothed edges after applying anti-aliasing.

Figure 9
figure 9

Smoothed straight text-line.

Figure 10
figure 10

The straightened text-lines. (a) Text-line 1. (b) Text-line 1. (c) Text-line 3.

IV. Experimental results and discussions

The proposed straightening algorithm is tested on 140 images having variety of stylistic multi-oriented text collected from newspapers, books, notebooks, journal articles, magazines, maps, engineering drawings, hoarding, and sign and notice boards. Some images of dataset contain other information along with the text. So, in view of this, we separated multi-oriented text manually from other objects present in the images with the help of MS Paint as shown in Figure 11.

Figure 11
figure 11

Sample images of dataset used. a) Original images of dataset. (b) Manually cropped images used in actual experiments.

This method is evaluated on the basis of visual perception and the mean square error calculation. Mean square error is calculated by line fitting applied on input and output images with the help of following equations:

The least square line uses a straight line

Y=a+bx
(13)

to approximate the given set of data (x1, y1), (x2, y2), ….., (x n , y n ), where n ≥ 2.

The best fitting curve f (x) has the least square error, i.e.,

= i = 1 n y i f x i 2 = i = 1 n y i a + b x i 2 =min
(14)

From Table 1, it is observed that 99.89% error removed from Figure 2; a Devanagari text-line document image; 97.78%, 94.00%, and 99.65% error removed from Figure 6a, 6b, and 6c, respectively, with different font style English text-lines document images; 98.19% error removed from Figure 12b; a Devanagari text-line document image and 99.78% error removed from Figure 12c; English text-line document image and 99.82% error removed from Figure 12d; and Chinese text-line document image.The proposed method was tested on handwritten as well as machine-printed Devanagari, English, and few Chinese text images. The detection of curve and its correction thereafter worked very well with all sample images. The experimental results of proposed method are promising as shown in Figures 10 and 12.

Table 1 Evaluation results with respect to used measure
Figure 12
figure 12

The results of multi-script text-line segmentation. (a) Input image. (b) Devanagari text-line 1. (c) English text-line 2. (d) Chinese text-line 3. (e) Text-line 1. (f) Text-line 1. (g) Text-line 3. (h) Text-line 1. (i) Text-line 1. (j) Text-line 3.

V. Conclusions

In this paper, we presented a curved text-line straightening (correction) technique over the text images having wide variations in terms of font, layout, and size of machine printed as well as handwritten Devanagari, English, and Chinese text-lines. The proposed straightening algorithm is tested on 140 images but more test samples can reveal more output cases in terms of merits or demerits of proposed algorithm. However, the results of few images needs to be further corrected for the better performance of OCR systems. Although we have done experiments only on Devanagari, English, and Chinese text-lines, but proposed algorithm can also be useful to handle all type of languages and scripts such as Brahmi, Grantha, Sinhalese, Bali, etc. The main contribution of this work is in the development of language/script-independent OCR systems.

In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines within a single document. The proposed approach is limited to straightening of text-lines and words only and also cannot work well on text-lines not having gap between text-lines. Selection of size of structuring element is set manually in our proposed approach. Automation of structuring element's size selection is required for the enhancement of the accuracy of the proposed approach.

To the best of our knowledge, not a single method proposed to correct the orientation of curved text-line has been reported.

Most of the works reported on Indian languages are on straight text-line documents. Elaborated studies on curved, multi-oriented, or skewed text-line documents are not much undertaken by the researchers in the development of script/language-independent OCR systems.