Abstract
Stylistic text can be found on sign boards, street and organizations boards and logos, bulletin boards, announcements, advertisements, dangerous goods plates, warning notices, etc. In stylistic text images, text-lines within an image may have different orientations such as curved in shape or not be parallel to each other. As a result, extraction and subsequent recognition of individual text-lines and words in such images is a difficult task. In this paper, we propose a novel scheme for straightening of curved text-lines using the concept of dilation, flood-fill, robust thinning, and B-spline curve-based fitting. In the proposed scheme, at first, dilation is applied on individual text-lines to cover the area within a certain boundary. Next, thinning is applied to get the path of the text, approximate the path using the B-spline, find the angle between the normal at a point on the curve and the vertical line, and finally visit each point on the text and rotate by their corresponding angles. The proposed methodology is tested on variety of text images containing text-lines in Devanagari, English, and Chinese scripts which is evaluated on the basis of visual perception and the mean square error (MSE) calculation. MSE is calculated by line fitting applied on input and output images. On the basis of evaluation results obtained in our experiments, the proposed method is promising.
Similar content being viewed by others
I. Introduction
A large volume of research effort has been dedicated to OCR systems. Numbers of algorithms [1–6] are available for this purpose, and many commercial OCR systems [7, 8] are now available in the market but most of these systems can recognize only text images having straight text-lines (horizontal) and designed only for a specific script or language. On the other hand, there are few limitations in regard to the source materials and character formatting which make feature extraction and recognition difficult.
Increasing demand for stylistic text recognition attracts researchers for the designing and development of new algorithms that can handle such type of text due to usage in many potential applications including container identification mark recognition system [9], vehicle license plate recognition [10], text recognition in video and images [11–14], image retrieval [15] from the database, intelligent transport systems [16], robotics [17], and text translation service for tourist assistance [18], for foreigners with language barrier, etc. Stylistics text as shown in Figure 1 can be found on greeting cards, front cover of books, organization's logos, stamp seals, maps, engineering drawings, advertisements, newspaper's front page, hallmark cards, signboards, hording boards, archives, etc., containing multi-oriented, multi-font size, and curved text-lines or words resulting in errors in the recognition process. There may be two ways for the recognition of such documents: one is to develop algorithms those can recognize words and text-lines in actual format (curved etc.), and second is to convert curved text-lines into straight text-lines, preferably oriented horizontally. Further, the straight text-lines must be segmented into words or characters before going for recognition process. The main advantage of proposed approach is that no specific feature extraction and classification techniques or dataset is required in the recognition of such documents. So, in view of this, text normalization plays important role in the recognition of stylistic text.
II. Related works
Many pieces of works are available on the document image recognition [2, 5, 6] having straight text-lines and words. But in the literature, there are only few works available towards the recognition of stylistic documents [19–44].
In 1991, Xie et al. [19] proposed a pattern recognition system invariant of translation, scale change, and rotation transformation of pattern and 97% recognition accuracy is obtained from the 10 Arabic numerals. Another work on English stylistic text recognition due to Tang et al. [20] used a translation-ring-projection algorithm to handle the multi-oriented English alphabets in 1991.
In 2000, one work on English stylistic text recognition is due to Adam et al. [21] in which an approach of recognition of multi-oriented and multi-scaled character in engineering drawings is proposed. Fourier-Mellin transform is used to recognize the characters. This approach is limited to the recognition of few characters and it is time consuming. Further in 2001, Yang et al. [22] proposed an approach of three stages for multi-oriented Chinese character recognition where features are mainly based on geometric measures of the foreground pixels of the characters. It is limited to Chinese language only.
In 2003, Hase et al. [23] proposed a multi-oriented character handling approach based on the character types such as inclined, horizontal, vertical, curved, etc., and it considers character realignment horizontally and then for recognition. The main drawback of Hase et al. [23] approach is the distortion due to realignment of curved text. For rotated and/or inclined English character recognition, Hase et al. [23] used a parametric Eigen-space-based approach. This method is also limited to English character recognition but not used in variation of font style, size, and multi-script.
In 2005, Pal et al. [24] proposed a recognition-based approach to handle Indian multi-oriented and curved text. It is based on the water reservoir concept for segmentation of characters from stylistic documents without any skew correction. Next, individual characters are recognized. This approach is limited to recognize Bangla and Devanagari script text only. In 2006, Hayashi et al. [25] proposed a rotation invariant Arabic numerals recognition system where a numeral is divided into elementary sub-patterns like straight line, C-shaped line, and 0-shaped line using thinning algorithm, and then recognized based on different features like curvature, angle information, length, arc-length, etc. of the sub-pattern. In 2006, Pal et al. [26] proposed a method towards the recognition of multi-oriented and multi-sized English characters based on the modified quadratic discriminate function (MQDF). The main drawback of the proposed approach is that it cannot distinguish similar looking character such as ‘b’ and ‘q’ , ‘p’ and ‘d’ , ‘n’ and ‘u’ , etc. This is because of the use of rotational invariant features.
In 2007, Monwar et al. [27] proposed an approach of recognizing printed alpha-numeric character of different angle and in this approach, each character is described by a small set of two-dimensional characteristic views of different angles for feature extraction. In 2008, Roy et al. [28] proposed an approach towards recognition of English character in graphical documents containing multi-scale and multi-oriented text. For recognition of such multi-scale and multi-oriented characters, a support vector machine (SVM)-based scheme is presented.
In 2010, Pal et al. [29] proposed an approach for the recognition of multi-oriented Bangla and Devnagari characters. Although it is also a recognition-based system, this system fails in confusing characters of Bangla and Devnagari script.
In 2011, Chiang et al. [30] proposed a general text recognition technique to handle non-homogeneous text by exploiting dynamic character grouping criteria based on the character sizes and maximum desired string curvature. In 2011, Shivakumara et al. [31] proposed an approach to detect multi-oriented text in videos. The input image is first filtered with Fourier-Laplacian, and K-means clustering is then used to identify candidate text regions based on the maximum difference. The skeleton of each connected component helps to separate the different text strings from each other. Finally, text string straightness and edge density are used for false-positive elimination. Shivakumara et al. [31] method is limited to English language text orientation.
All the approaches discussed earlier based on the recognition of multi-oriented characters and are limited to proper script recognition. There is not a single approach in the literature which is based on the straightening of curved text-lines or words and script independence. In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines in a single document.
III. Proposed approach
In this work, the images to be processed are captured by a scanner and camera. For the thresholding of text images, Gatos [32] method is used. Figure 2 shows a sample of curved text-line image.
Curved or stylistic text present in document images poses problems in segmentation and recognition. So, before recognition, it is important to straighten the text-lines. This paper attempts to present a method, which is based on the following important steps:
Step 1
Apply morphological dilation operation [33] on the text document using a disk-based structuring element with radius 17 so that region grows in the form of circular bubbles until all foreground objects (black pixels) are covered within a certain boundary. The selection of radius size of structuring element is a manual parameter in our work. It can vary from images to images, but on the basis of experiments, 17 is the most appropriate. Figure 3 shows the curved text image after dilation operation.
Step 2
For the approximation of black region of the curved text-line, we need pixels in a certain order that follows a path coinciding with the curve. This is accomplished by applying thinning to convert the region to one-pixel-thick lines. Since general thinning method results in many unwanted edges along the curve, as shown in Figure 4a, thinning method given in [34] is applied to overcome the shortcomings. Figure 4b shows the result of the smoothed thinning. Thinning is based on the following steps:
-
The image is inverted, ‘1’ as black and ‘0’ as white for the MATLAB perspective.
-
Erosion is applied on dilated image using disk-based structuring element of size ‘17’.
-
Morphological operation of thinning is applied on eroded image with ‘10’ iterations.
-
The output image is again inverted for the better working of further steps of curve straightening.
Step 3
In this step, curve fitting is applied on the result obtained from the previous step. For the curve fitting, B-spline [35–41] curve is used instead of polynomial curve to approximate the pixel data. Polynomial curve fails to approximate complex curve as shown in Figure 2. B-spline and Bezier curves have very similar form but Bezier curve contains more information. Approximating simple curve using polynomial curve suffers with Runge’s phenomenon [42–44]. So, when we approximate the higher-degree pixel data, the accuracy does not always increase. Hence, in the proposed approach, B-spline curve is used, which is free from Runge's phenomenon even at higher degrees. B-spline curve is described as follows:
Given n + 1 control points P0, P1,………, P n and a knot vector U = {u0, u1,……… u m }, the B-spline curve of degree p defined by these control points and knot vector u is
Where N i , p (u)s are B-spline basis functions of degree p.
The degree of B-spline basis functions is defined by p. The i th B-spline basis function of degree p, written as N i , p (u), is defined recursively as follows:
If we have n + 1 data points D0, D2,…………, D n and want to find a B-Spline curve that can follow the shape of the data polygon without actually containing the data points, then there is need of two more inputs. The first input is the number of control points and second input is degree (p), where n > h ≥ p ≥ 1. With these two inputs, a set of parameter and a knot vector can be determined. Let the parameter be t0, t1,…, t n . The numbers of data points are equal to the number of parameters. Now, the approximation B-spline of degree p is given by:
Where P0, P1, ……, P h are the h + 1 unknown control points.
After passing the first and last data points to curve, D0 = C(0) = P0 and D n = C(1) = P h , there are only h − 1 unknown control points. Taking this into consideration, the curve equation becomes the following:
To find out control points P1,… Ph−1 such that the function f (P1,….,Ph−1) is minimized. The approximation is done using least square method. The sum of all squared distances is
Since we have h − 1 variables, g runs from 1 to h − 1 and there are h − 1 such equations:
The system of linear equation can be rewritten as
Since N and Q are known, solving this system of linear equations for P gives the desired control points.
Step 4
Stylistic documents may have more than one curved or multi-oriented text-lines as shown in Figure 1. Before the straightening of text-lines, it is important to extract individual text-lines first. Multi-text-lines segmentation is dealt in the following manner:A. Dilation is applied on each of the text-line to cover all the foreground pixels within the black boundary. Figure 5 shows the dilation on the curved multi-text-lines using a circular structuring element of radius 17. The size of structuring element is chosen on the basis of experiments performed in this work.B. Modified flood fill algorithm is applied to find out the number of different text-lines in the image and to separate them from the input image. The steps 1 to 4 mentioned earlier are applied for the straightening of the image obtained after segmentation process. Modified flood fill algorithm marks the pixels of a region such that all the pixels of same region got the same numbering. It makes a blank image for each region and copies all the pixels of image (original undiluted image) to the corresponding image of the regions. Hence, separate images are obtained for separate text-lines. The segmented images are shown in Figure 6.
However, segmentation step produces the following errors in the extraction of individual text-line from multi-line images due to the following problems arising during the morphological operations such as:
-
If the inter-character spacing is large in stylistic text-lines. Although, it is assumed that the text image to be processed is based on the isolated character recognition. So, if inter-character spacing is large, the recognition rate will not be more affected.
-
When two or more text-lines are very close to each other, it will result in merged text-lines. In our experiments, we have taken only those images which contain sufficient gap between text-lines due to the limitation of morphology.
The following pseudo code outlines the modified flood fill algorithm.
In this step, the least square method [45] is used to approximate the pixel data into B-spline curve. Due to the discrete domain of the sample space, the output image may suffer from aliasing effects as shown in Figures 7 and 8a. This will cause serrated character edges in the output image as shown in Figure 8a. To remove the artifact produced by aliasing, anti-aliased pixels are used with bilinear interpolation [33] as shown in Figure 8b. Figure 9 shows the final smoothed output of the proposed approach of straightening of the curved text-line.
Steps 1 to 4 are applied on the text-lines obtained after modified flood fill technique, and the results obtained are shown in the Figure 10.
IV. Experimental results and discussions
The proposed straightening algorithm is tested on 140 images having variety of stylistic multi-oriented text collected from newspapers, books, notebooks, journal articles, magazines, maps, engineering drawings, hoarding, and sign and notice boards. Some images of dataset contain other information along with the text. So, in view of this, we separated multi-oriented text manually from other objects present in the images with the help of MS Paint as shown in Figure 11.
This method is evaluated on the basis of visual perception and the mean square error calculation. Mean square error is calculated by line fitting applied on input and output images with the help of following equations:
The least square line uses a straight line
to approximate the given set of data (x1, y1), (x2, y2), ….., (x n , y n ), where n ≥ 2.
The best fitting curve f (x) has the least square error, i.e.,
From Table 1, it is observed that 99.89% error removed from Figure 2; a Devanagari text-line document image; 97.78%, 94.00%, and 99.65% error removed from Figure 6a, 6b, and 6c, respectively, with different font style English text-lines document images; 98.19% error removed from Figure 12b; a Devanagari text-line document image and 99.78% error removed from Figure 12c; English text-line document image and 99.82% error removed from Figure 12d; and Chinese text-line document image.The proposed method was tested on handwritten as well as machine-printed Devanagari, English, and few Chinese text images. The detection of curve and its correction thereafter worked very well with all sample images. The experimental results of proposed method are promising as shown in Figures 10 and 12.
V. Conclusions
In this paper, we presented a curved text-line straightening (correction) technique over the text images having wide variations in terms of font, layout, and size of machine printed as well as handwritten Devanagari, English, and Chinese text-lines. The proposed straightening algorithm is tested on 140 images but more test samples can reveal more output cases in terms of merits or demerits of proposed algorithm. However, the results of few images needs to be further corrected for the better performance of OCR systems. Although we have done experiments only on Devanagari, English, and Chinese text-lines, but proposed algorithm can also be useful to handle all type of languages and scripts such as Brahmi, Grantha, Sinhalese, Bali, etc. The main contribution of this work is in the development of language/script-independent OCR systems.
In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines within a single document. The proposed approach is limited to straightening of text-lines and words only and also cannot work well on text-lines not having gap between text-lines. Selection of size of structuring element is set manually in our proposed approach. Automation of structuring element's size selection is required for the enhancement of the accuracy of the proposed approach.
To the best of our knowledge, not a single method proposed to correct the orientation of curved text-line has been reported.
Most of the works reported on Indian languages are on straight text-line documents. Elaborated studies on curved, multi-oriented, or skewed text-line documents are not much undertaken by the researchers in the development of script/language-independent OCR systems.
References
Singh BM, Mittal A, Ghosh D: An evaluation of different feature extractors and classifiers for offline handwritten Devanagari character recognition. Journal of Pattern Recognition and Research 2011, 06(2):269-277. 10.13176/11.302
Ghosh D, Dube T, Shivaprasad AP: Script recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32: 2142-2161.
Pal U, Chaudhuri BB: Indian script character recognition: a survey. Pattern Recognition 2004, 37: 1887-1899. 10.1016/j.patcog.2004.02.003
Doermann D, Liang J, Li H: Progress in camera-based document image analysis. Proceedings of the 7thInternational Conference on Document Analysis and Recognition (ICDAR), IEEE, Volume 1 2003, 606-610.
Arica N, Yarman-Vural FT: An overview of character recognition focused on off-line handwriting. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications And Reviews 2001, 31(2):216-233.
Amin A: Off-line Arabic character-recognition: the state of the art. Pattern Recognition 1998, 31: 517-530. 10.1016/S0031-3203(97)00084-8
ABBYY Software. http://www.finereader.com
Free OCR Software. http://www.softi.co.uk
Kumano S, Miyamoto K, Tamagawa M, Ikeda H, Kan K: Development of container identification mark recognition system. Trans. Inst. Electron. Inform. Commun. Eng. D-I 2001, J84D-II(6):1073-1083.
Cui Y, Huang Q: Character extraction of license plates from video. Proceeding of International Conference on CVPR 1997, 502-507.
Hua S, Liu W, Zhang HJ: Automatic performance evaluation for video text detection. In International Conference on Document Analysis and Recognition (ICDAR). Seattle, WA, USA; 2001:545-550.
Jain AK, Yu B: Automatic text location in images and video frames. Pattern Recognition 1998, 31(12):2055-2076. 10.1016/S0031-3203(98)00067-3
Lim Y, Choi S, Lee S: Text extraction in MPEG compressed video for content-based indexing. Proceeding of 15th International Conference on Pattern Recognition (ICPR), Volume 4 2000, 409-412.
Pavlidis T: Recognition of printed text under realistic conditions. Pattern Recognition Letters 1993, 14(4):317-326. 10.1016/0167-8655(93)90097-W
Sato T, Kanade T, Hughes EK, Smith MA: Video OCR for digital news archives. IEEE International Workshop on Content-Based Access of Image and Video Database 1998.
Sun Q, Lu Y: Text location in camera-captured guide post images. Proceeding of Chinese Conference on Pattern Recognition (CCPR), IEEE 2010, 1-4.
Xilin C, Jie Y, Jing Z, Waibel A: Automatic detection and recognition of signs from natural scenes. IEEE Trans. Image Process 2004, 13(1):87-99. 10.1109/TIP.2003.819223
Bascon SM, Arroyo SL, Jimenez PG, Moreno HG, Ferreras LF: Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems 2007, 8(2):264-278.
Xie Q, Kobayashi A: A construction of pattern recognition system invariant of translation, scale-change and rotation transformation of pattern. Transactions of the Society of Instrument and Control Engineers 1991, 27: 1167-1174.
Tang YY, Cheng HD, Suen CY: Translation-ring-projection (TRP) algorithm and its VLSI implementations. Character and Hand-writing Recognition, World Scientific 1991, 25-56.
Adam S, Ogier JM, Carlon C, Mullot R, Labiche J, Gardes J: Symbol and character recognition: application to engineering drawing. International Journal of Document Analysis and Recognition 2000, 3: 89-101.
Yang TN, Wang SD: A rotation invariant printed Chinese character recognition system source. Pattern Recognition Letters 2001, 22: 85-95. 10.1016/S0167-8655(00)00089-1
Hase H, Shinokawa T, Yoneda M, Suen CY: Recognition of rotated characters by Eigen-space. Proceeding of 7th International Conference on Document Analysis and Recognition (ICDAR) 2003, 731-735.
Pal U, Tripathy N: Recognition of Indian multi-oriented and curved text. Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR) 2005, 141-145.
Hayashi T, Takagi N: A consideration on rotation invariant character recognition. World Automation Congress 2006, 1-6.
Pal U, Kimura F, Roy K, Pal T: Recognition of English multi-oriented characters. Proceedings of the International Conference on Pattern Recognition 2006, 873-876.
Monwar M, Haque W, Paul PP: A new approach for rotation invariant optical character recognition using Eigen digit. Proceedings of the Canadian Conference on Electrical and Computer Engineering 2007, 1317-1320.
Roy PP, Pal U, Lladós J, Kimura F: Convex Hull based approach for multi-oriented character recognition from graphical documents. Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), IEEE 2008, 1-4.
Pal U, Roy PP, Tripathy N, Llados J: Multi-oriented Bangla and Devnagari text recognition. Pattern Recognition 2010, 43: 4124-4136. 10.1016/j.patcog.2010.06.017
Chiang YY, Knoblock CA: Recognition of multi-oriented, multi-sized, and curved text. Proceeding of International Conference on Document Analysis and Recognition (ICDAR) 2011, 1399-1403.
Shivakumara P, Phan TQ, Tan CL: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33(2):412-419.
Gatos B, Pratikakis I, Perantonis SJ: Adaptive degraded document image binarization. Pattern Recognition 2006, 39: 317-327. 10.1016/j.patcog.2005.09.010
Gonzalez RC, Woods RE: Digital image processing. (DIP/3e), Pearson Education Asia 3rd edition. 2008.
Singh BM, Goswami S, Goyal P, Mittal A: A robust thinning algorithm for straightening of curved text line. In Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011), December 20–22, Advances in Intelligent and Soft Computing . Springer 2012, 131: 903-910.
Boor CD: A practical guide to splines. Springer Verlag, Berlin - Heidelberg - New York; 1978:113-114.
Knott GD: Interpolating cubic splines. Springer, Springer; 2000:151.
Lee ETY: A simplified B-spline computation routine. Computing Springer-Verlag 1982, 29(4):365-371.
Lee ETY: Comments on some B-spline algorithms. Computing, Springer-Verlag 1986, 36(3):229-238.
Brinks R: On the convergence of derivatives of B-splines to derivatives of the Gaussian function. Comp. Appl. Math 2008, 27: 1.
Prautzsch H, Boehm W, Paluszny M: Bezier and B-spline techniques. Springer, Berlin-Heidelberg-New York; 2002:60-66.
Splitting a uniform B-spline curve: Splitting a uniform B-spline curve. ( http://www.idav.ucdavis.edu/education/CAGDNotes/Quadratic-Uniform-B-Spline-Curve-Splitting/Quadratic-Uniform-B-Spline-Curve-Splitting.html
Runge C: Uber empirische Funktionen und die Interpolation zwischen aquidistanten Ordinaten. Zeitschrift für Mathematik und Physik 1901, 46: 224-243. available at http://www.archive.org (http://www.archive.org/details/zeitschriftfrma12runggoog), 1901
Berrut JP, Trefethen LN: Barycentric lagrange interpolation. SIAM Review 2004, 46: 501-517. http://dx.doi.org/10.1137%2FS0036144502417715), (http://www.worldcat.org/issn/1095-7200 10.1137/S0036144502417715
Dahlquist G, Bjork A: Equidistant Interpolation and the Runge Phenomenon. Numerical Methods 1974, 101-103.
Deyi Zhang BS: Least squares approximation by splines with free knots: optimization by hybrid of global and local search. A Thesis In Mathematics and Statistics 2010. 2010
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Singh, B.M., Mittal, A. & Ghosh, D. A novel method for straightening curved text-lines in stylistic documents. J Image Video Proc 2014, 36 (2014). https://doi.org/10.1186/1687-5281-2014-36
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1687-5281-2014-36