Abstract
This paper presents a new algorithm to extract shape-oriented feature vectors using pixel intensities from offline printed Devanagari script documents. Almost, all the characters of the script contain Shirorekha (header line) on the upper portion, which makes segmentation a difficult and complex problem. The problem gets more challenging when images are in multiple gray levels, skewed and noisy. A new fast and effective algorithm is designed using gradient structural information, and its performance is evaluated on a challenging dataset containing 80 printed documents consisting of around 87,000 characters. Experimental results show that the proposed algorithm has 98.56% accuracy, which is 02.66% higher than that reported in literature. Also, the proposed algorithm is time efficient and less complex in comparison with the existing methods.
Similar content being viewed by others
References
Jayadevan, R.; Kolhe, S.; Patil, P.; Pal, U.: Offline recognition of Devanagari script—a survey. IEEE Transact. Syst. Man Cybern. Part C Appl. Rev. 41(6), 782–796 (2011)
Singla, S.K.; Yadav, R.K.: Optical character recognition based speech synthesis system using lab view. J. Appl. Res. Technol. 12(5), 919–926 (2014)
Jindal, K.; Kumar, R.: A Note on Data mining based noise diagnosis and fuzzy filter design for image processing. Comput. Electr. Eng. 49, 50–51 (2016)
Mohandes, M.; Deriche, M.; Ahmadi, H.; Kousa, M.: An intelligent system for vehicle access control using RFID and ALPR technologies. Arab. J. Sci. Eng. 41, 3521–3530 (2016)
Nikolaou, N.; Makridis, M.; Gatos, B.; Stamatopoulos, N.; Papamarkos, N.: Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths. Image Vision Comput. 28, 590–604 (2010)
Murthy, O.V.R.; Roy, S.; Narang, V.; Hanmandlu, M.; Gupta, S.: An approach to divide pre-detected Devanagari words from the scene images into characters. Signal Image Video Process. 7(6), 1071–1082 (2013)
Ma, H.; Doermann, D.: Adaptive Hindi ocr using generalized hausdorff image comparison. ACM Transact Asian Lang. Inf Process. 2(3), 193–218 (2003)
Grafmller, M.; Beyerer, J.: Performance improvement of character recognition in industrial applications using prior knowledge for more reliable segmentation. Expert Syst. Appl. 40(17), 6955–6963 (2013)
Garain, U.; Chaudhuri, B.B.: Segmentation of touching characters in printed Devanagari and bangla scripts using fuzzy multifactorial analysis. IEEE Transact. Syst. Man Cybern. 32(4), 449–459 (2002)
Shivakumara, P.; Yuan, Z.; Zhao, D.; Lu, T.; Tan, C.L.: New gradient-spatial-structural features for video script identification. Comput. Vision Image Underst. 130, 35–53 (2015)
Pande, H.; Dhami, H.S.: Mathematical modelling of occurrence of letters and word’s initials in texts of Hindi language. SKASE J. Theor. Linguist. 7(2), 19–37 (2010)
Frias-Martinez, E.; Sanchez, A.; Velez, J.: Support vector machines versus multi-layer perceptrons for efficient off-line signature recognition. Eng. Appl. Artif. Intell. 19, 693–704 (2006)
Bansal, V.; Sinha, R.M.K.: Integrating knowledge sources in Devanagari text recognition system. IEEE Transact. Syst. Man Cybern. Part A Syst. Hum. 30(4), 500–505 (2000)
Sinha, R.M.K.; Mahabala, H.: Machine recognition of Devanagari script. IEEE Transact. Syst. Man Cybern. 9, 435–441 (1979)
Kompalli, S.; Setlur, S.; Govindaraju, V.: Devanagari ocr using a recognition driven segmentation framework and stochastic language models. Int. J. Doc. Anal. Recognit. 12(2), 123–138 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jindal, K., Kumar, R. A Novel Shape-Based Character Segmentation Method for Devanagari Script. Arab J Sci Eng 42, 3221–3228 (2017). https://doi.org/10.1007/s13369-017-2420-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-2420-7