Abstract
In recent years, multi-lingual character segmentation and recognition has attracted the wide range of researchers. However, the variations in the structures of characters in different languages, writing styles of different people and font sizes challenges that need further enhancement. Towards such objective, in this paper, we have proposed a novel multi-lingual handwritten character recognition framework for three different languages such as Bangla, Kannada and Telugu. At segmentation phase, this framework performs both word and character segmentation with the help of an Adaptive Projection Profiling (APP) and Edge Density Filter (EDF) respectively. Further at the recognition phase, we propose to use gradient based feature descriptors to extract a Composite Feature Vector (CFV) from handwritten character images, which are then fed to Support Vector Machine (SVM) algorithm for recognition. At experimental evaluation, we have simulated the proposed model over three different language scripts. The experimental results show that the proposed model outperforms the conventional method with an average improvement in the recognition accuracy of 3% for both cross validation and test simulations.
Similar content being viewed by others
References
Anupama N, Rupa Ch and Sreenivasa Reddy E (2013) Character segmentation for Telugu image document using multiple histogram projections, Global J Comput Sci Technol Graph Vis 13(5):1–7
Anwar K, Nugroho H (2015) A segmentation scheme of Arabic words with harakat. In: IEEE International Conference on Communication, Networks and Satellite (COMNESTAT), Bandung, Indonesia, pp 111–114. https://doi.org/10.1109/COMNETSAT.2015.7434299
Aradhya VNM, Kumar GH, Noushath S (2008) Multilingual OCR system for south indian scripts and english documents: an approach based on fourier transform and principal component analysis. Eng Appl Artif Intell 21(4):658–668
Arróspide J, Salgado L, Camplani M (2013) Image–based on-road vehicle detection using cost-effective histograms of oriented gradients. J Vis Commun Image Represent 24(7):1182–1190
Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457
Biswas M, Islam R, Shom GK, Shopon M, Mohammed N, Momen S, Abedin MA (2017) BanglaLekha-Isolated: a comprehensive bangla handwritten character dataset. Computation and Language. https://doi.org/10.48550/arXiv.1703.10661
Brink AA, Smit J, Bulacu ML, Schomaker LRB (2012) Writer identification using directional ink-trace width measurements. Pattern Recogn 45(1):162–171
de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images, international conference on computer vision theory and applications, Lisbon, Portugal, February 5-8, 2009 - Volume 2
Chacko BP, Vimal Krishnan VR, Raju G, Babu Anto P (2012) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern 3(2):149–161
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference Comput Vis Pattern Recognit (CVPR'05), San Diego, CA, vol. 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177
Das TR, Hasan S, Jani MR, Tabassum F, Islam MI (2021) Bangla handwritten character recognition using extended convolutional neural network. J Comput Commun 9:158–171
Deshpande P, Bondre A, Vaishnavi B, Ghorpade HB, Bhosale SS, Patil CG (2021) Devanagari character recognition using convolutional neural network. Int J Future Gener Commun Networking 14(1):4189–4198
Dhanikonda SR, Subhash Chandra N (2021) A survey on Telugu optical character recognition from digital images. Turk J Comput Math Educ 12(6):999–1003
Inkeaw P, Udomwong P, Chaijaruwanich J (2021) Density based semi-automatic labeling on multi-feature representations for ground truth generation: application to handwritten character recognition. Knowl-Based Syst 220(23):106953
Islam N, Islam Z, Noor N (2016) A survey on optical character recognition system. J Inform Commun Technol 10(2):1–4
Jebril NA, Al-Zoubi HR, Abu Al-Haija Q (2018) Recognition of handwritten Arabic characters using histograms of oriented gradient (HOG). Pattern Recognit Image Anal 28(2):321–345
Kadam D, Chavan P, Pandhara P (2018) Literature survey on recognition and evaluation of optical character recognition (OCR). Int J Sci Eng Res 9(2):72–75
Kaur A, Baghla S, Kumar S (2015) Study of various character segmentation techniques for handwritten off-line cursive words: a review. Int J Adv Sci Eng Technol 3(3):154–158
Khader M, Aziz Q, Munna A (2019) Contour-based character segmentation for printed Arabic text with diacritics. J Electron Imag 28(4):1–15. https://doi.org/10.1117/1.JEI.28.4.043030
Kumar R, Ravulakollu KK (2014) Offline handwritten DEVNAGARI digit recognition. ARPN J Eng Appl Sci 9(2):109–115
Lee SE, Min K, Suh T (2013) Accelerating histograms of oriented gradients descriptor extraction for pedestrian recognition. Comput Electr Eng 39(4):1043–1048
Liao X, Yu Y, Li B, Li Z, Zheng Q (2020) A new payload partition strategy in color image steganography. IEEE Trans Circ Syst Video Technol 30(3):685–696
Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2020.3004708
Liao X, Li K, Zhu X, Ray Liu KJ (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Select Topics Signal Process 14:955–968. https://doi.org/10.1109/jstsp.2020.3002391
Louloudis G, Gatos B, Pratikakis I, Halatsis C (2009) Text line and word segmentation of handwritten documents. Pattern Recogn 42(12):3169–3183
Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110
Mahmoud SA, Mahmoud AS (2009) The use of Hartley transform in OCR with application to printed Arabic character recognition. Pattern Anal Applic 12:353–365
Mamatha HR and Srikantamurthy K (2012) Morphological operations and projection profiles based segmentation of handwritten Kannada document. Int J Appl Inform Syst 4(5):13–19
Marwa Amara K, Zidi KGSZ (2016) New rules to enhance the performances of histogram projection for segmenting small-sized arabic words, In: Abraham A, Han S, Al-Sharhan S, Liu H (eds) Hybrid Intelligent Systems. HIS 2016. Advances in Intelligent Systems and Computing, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-319-27221-4_14
Mathew M, Singh AK, Jawahar CV (2016) Multilingual OCR for indic scripts. In: 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, Santorini, pp 186–191
Mazen Bahashwan SAB, Sheikh U (2017) Efficient segmentation of Arabic handwritten characters using structural features. Int Arab J Inf Technol 14(6):870–879
Mousa MAA, Sayed MS, Abdalla MI (2017) Arabic character segmentation using projection based approach with profile’s amplitude filter. Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1707.00800
Pal U, Wakabayashi T, Kimura F (2007) Handwritten Bangla compound character recognition using gradient feature. Int Conf Inform Technol (ICIT 2007), Rourkela, pp 208–213. https://doi.org/10.1109/ICIT.2007.62
Poodikkalam SB, Loganathan P (2020) Optical character recognition based on local invariant features. Imag Sci J 68(4):214–224
Prasad JR, Kulkarni U (2015) Gujrati character recognition using weighted k-NN and mean X2 distance measure. Int J Mach Learn Cybern 6(1):69–82
Rahman A, Verma B (2013) Effect of ensemble classifier composition on offline cursive character recognition. Inf Process Manag 4(4):852–864
Raju G, Moni BS, Nair MS (2014) A novel handwritten character recognition system using gradient based features and run length count. Sadhana 39(6):1333–1355
Sahare P, Dhok SB (2017) Script identification algorithms: a survey. Int J Multimedia Inf Retr 6(3):211–232
Salmani Jelodar M, Fadaeieslam MJ, Mozayani N, Fazeli M (2007) A Persian OCR system using morphological operators. Int J Comp Inf Eng 1(4):137–1140
Seo J, Park H (2014) Robust recognition of face with partial variations using local features and statistical learning. Neurocomputing 129(0):41–48
Shahraki AA, Ghahnavieh AE, Mirmahdavi SA (2014) A morphological approach to Persian handwritten text line segmentation. Int Conf Comput Model Simul, Cambridge, pp 298-301. https://doi.org/10.1109/UKSim.2014.93
Shaikh NA, Mallah GA, Shaikh ZA (2009) Character segmentation of sindhi, an arabic style scripting language using height profile vector. Aust J Basic Appl Sci 3(4):4160–4169
Shakunthala BS, Pillai CS (2019) Unconstrained handwritten text line segmentation for Kannada language. Int J Innov Technol Explor Eng 8(12):953–956
Shi CZ, Gao S, Liu MT, Qi CZ, Wang CH, Xiao BH (2015) Stroke detector and structure based models for character recognition: a comparative study. IEEE Trans Image Process 24(12):4952–4964
Siddhaling U, Prema KV, Subba Reddy NV (2013) Document image segmentation for Kannada script using zone based projection profiles, AIM/CCPE 2012. CCIS 296:137–142
Singh P, Verma A, Chaudhari NS (2015) Feature selection based classifier combination approach for handwritten Devanagari numeral recognition. Sadhana 40(6):1701–1714
Somashekar T (2021) A survey on handwritten character recognition using deep learning technique. J Univ Shanghai Sci Technol 23(6):1019–1024
Surinta O, Karaaba MF, Schomaker LRB, Wiering MA (2015) Recognition of handwritten characters using local gradient feature descriptors. Eng Appl Artif Intell 45:405–414
Tian S, Bhattacharya U, Lu S, Su B, Wang Q, Wei X, Lu Y, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134
Vapnik VN (1998) Statistical learning theory, Wiley
Data availability statement
The datasets generated or analyzed during this study are not publicly available due to the author Ph.D (research) thesis submission but are available from the corresponding author on reasonable request.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interest in any material discussed in this research.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vishwanath, N.V., Manjunathachari, K. & Prasad, K.S. Multi-lingual character segmentation and recognition based on adaptive projection profiles and composite feature vectors. Multimed Tools Appl 82, 24247–24268 (2023). https://doi.org/10.1007/s11042-023-14523-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14523-w