Skip to main content
Log in

Multi-lingual character segmentation and recognition based on adaptive projection profiles and composite feature vectors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, multi-lingual character segmentation and recognition has attracted the wide range of researchers. However, the variations in the structures of characters in different languages, writing styles of different people and font sizes challenges that need further enhancement. Towards such objective, in this paper, we have proposed a novel multi-lingual handwritten character recognition framework for three different languages such as Bangla, Kannada and Telugu. At segmentation phase, this framework performs both word and character segmentation with the help of an Adaptive Projection Profiling (APP) and Edge Density Filter (EDF) respectively. Further at the recognition phase, we propose to use gradient based feature descriptors to extract a Composite Feature Vector (CFV) from handwritten character images, which are then fed to Support Vector Machine (SVM) algorithm for recognition. At experimental evaluation, we have simulated the proposed model over three different language scripts. The experimental results show that the proposed model outperforms the conventional method with an average improvement in the recognition accuracy of 3% for both cross validation and test simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Anupama N, Rupa Ch and Sreenivasa Reddy E (2013) Character segmentation for Telugu image document using multiple histogram projections, Global J Comput Sci Technol Graph Vis 13(5):1–7

  2. Anwar K, Nugroho H (2015) A segmentation scheme of Arabic words with harakat. In: IEEE International Conference on Communication, Networks and Satellite (COMNESTAT), Bandung, Indonesia, pp 111–114. https://doi.org/10.1109/COMNETSAT.2015.7434299

  3. Aradhya VNM, Kumar GH, Noushath S (2008) Multilingual OCR system for south indian scripts and english documents: an approach based on fourier transform and principal component analysis. Eng Appl Artif Intell 21(4):658–668

    Article  Google Scholar 

  4. Arróspide J, Salgado L, Camplani M (2013) Image–based on-road vehicle detection using cost-effective histograms of oriented gradients. J Vis Commun Image Represent 24(7):1182–1190

    Article  Google Scholar 

  5. Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457

    Article  Google Scholar 

  6. Biswas M, Islam R, Shom GK, Shopon M, Mohammed N, Momen S, Abedin MA (2017) BanglaLekha-Isolated: a comprehensive bangla handwritten character dataset. Computation and Language. https://doi.org/10.48550/arXiv.1703.10661

  7. Brink AA, Smit J, Bulacu ML, Schomaker LRB (2012) Writer identification using directional ink-trace width measurements. Pattern Recogn 45(1):162–171

    Article  Google Scholar 

  8. de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images, international conference on computer vision theory and applications, Lisbon, Portugal, February 5-8, 2009 - Volume 2

  9. Chacko BP, Vimal Krishnan VR, Raju G, Babu Anto P (2012) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern 3(2):149–161

    Article  Google Scholar 

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference Comput Vis Pattern Recognit (CVPR'05), San Diego, CA, vol. 1, pp 886–893. https://doi.org/10.1109/CVPR.2005.177

  11. Das TR, Hasan S, Jani MR, Tabassum F, Islam MI (2021) Bangla handwritten character recognition using extended convolutional neural network. J Comput Commun 9:158–171

    Article  Google Scholar 

  12. Deshpande P, Bondre A, Vaishnavi B, Ghorpade HB, Bhosale SS, Patil CG (2021) Devanagari character recognition using convolutional neural network. Int J Future Gener Commun Networking 14(1):4189–4198

    Google Scholar 

  13. Dhanikonda SR, Subhash Chandra N (2021) A survey on Telugu optical character recognition from digital images. Turk J Comput Math Educ 12(6):999–1003

    Google Scholar 

  14. Inkeaw P, Udomwong P, Chaijaruwanich J (2021) Density based semi-automatic labeling on multi-feature representations for ground truth generation: application to handwritten character recognition. Knowl-Based Syst 220(23):106953

    Article  Google Scholar 

  15. Islam N, Islam Z, Noor N (2016) A survey on optical character recognition system. J Inform Commun Technol 10(2):1–4

  16. Jebril NA, Al-Zoubi HR, Abu Al-Haija Q (2018) Recognition of handwritten Arabic characters using histograms of oriented gradient (HOG). Pattern Recognit Image Anal 28(2):321–345

    Article  Google Scholar 

  17. Kadam D, Chavan P, Pandhara P (2018) Literature survey on recognition and evaluation of optical character recognition (OCR). Int J Sci Eng Res 9(2):72–75

  18. Kaur A, Baghla S, Kumar S (2015) Study of various character segmentation techniques for handwritten off-line cursive words: a review. Int J Adv Sci Eng Technol 3(3):154–158

  19. Khader M, Aziz Q, Munna A (2019) Contour-based character segmentation for printed Arabic text with diacritics. J Electron Imag 28(4):1–15. https://doi.org/10.1117/1.JEI.28.4.043030

  20. Kumar R, Ravulakollu KK (2014) Offline handwritten DEVNAGARI digit recognition. ARPN J Eng Appl Sci 9(2):109–115

    Google Scholar 

  21. Lee SE, Min K, Suh T (2013) Accelerating histograms of oriented gradients descriptor extraction for pedestrian recognition. Comput Electr Eng 39(4):1043–1048

    Article  Google Scholar 

  22. Liao X, Yu Y, Li B, Li Z, Zheng Q (2020) A new payload partition strategy in color image steganography. IEEE Trans Circ Syst Video Technol 30(3):685–696

    Article  Google Scholar 

  23. Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Trans Depend Secure Comput. https://doi.org/10.1109/TDSC.2020.3004708

  24. Liao X, Li K, Zhu X, Ray Liu KJ (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Select Topics Signal Process 14:955–968. https://doi.org/10.1109/jstsp.2020.3002391

    Article  Google Scholar 

  25. Louloudis G, Gatos B, Pratikakis I, Halatsis C (2009) Text line and word segmentation of handwritten documents. Pattern Recogn 42(12):3169–3183

    Article  MATH  Google Scholar 

  26. Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  27. Mahmoud SA, Mahmoud AS (2009) The use of Hartley transform in OCR with application to printed Arabic character recognition. Pattern Anal Applic 12:353–365

    Article  MathSciNet  Google Scholar 

  28. Mamatha HR and Srikantamurthy K (2012) Morphological operations and projection profiles based segmentation of handwritten Kannada document. Int J Appl Inform Syst 4(5):13–19

  29. Marwa Amara K, Zidi KGSZ (2016) New rules to enhance the performances of histogram projection for segmenting small-sized arabic words, In: Abraham A, Han S, Al-Sharhan S, Liu H (eds) Hybrid Intelligent Systems. HIS 2016. Advances in Intelligent Systems and Computing, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-319-27221-4_14

  30. Mathew M, Singh AK, Jawahar CV (2016) Multilingual OCR for indic scripts. In: 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, Santorini, pp 186–191

    Chapter  Google Scholar 

  31. Mazen Bahashwan SAB, Sheikh U (2017) Efficient segmentation of Arabic handwritten characters using structural features. Int Arab J Inf Technol 14(6):870–879

  32. Mousa MAA, Sayed MS, Abdalla MI (2017) Arabic character segmentation using projection based approach with profile’s amplitude filter. Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1707.00800

  33. Pal U, Wakabayashi T, Kimura F (2007) Handwritten Bangla compound character recognition using gradient feature. Int Conf Inform Technol (ICIT 2007), Rourkela, pp 208–213. https://doi.org/10.1109/ICIT.2007.62

  34. Poodikkalam SB, Loganathan P (2020) Optical character recognition based on local invariant features. Imag Sci J 68(4):214–224

    Article  Google Scholar 

  35. Prasad JR, Kulkarni U (2015) Gujrati character recognition using weighted k-NN and mean X2 distance measure. Int J Mach Learn Cybern 6(1):69–82

    Article  Google Scholar 

  36. Rahman A, Verma B (2013) Effect of ensemble classifier composition on offline cursive character recognition. Inf Process Manag 4(4):852–864

    Article  Google Scholar 

  37. Raju G, Moni BS, Nair MS (2014) A novel handwritten character recognition system using gradient based features and run length count. Sadhana 39(6):1333–1355

    Article  MathSciNet  MATH  Google Scholar 

  38. Sahare P, Dhok SB (2017) Script identification algorithms: a survey. Int J Multimedia Inf Retr 6(3):211–232

    Article  Google Scholar 

  39. Salmani Jelodar M, Fadaeieslam MJ, Mozayani N, Fazeli M (2007) A Persian OCR system using morphological operators. Int J Comp Inf Eng 1(4):137–1140

  40. Seo J, Park H (2014) Robust recognition of face with partial variations using local features and statistical learning. Neurocomputing 129(0):41–48

    Article  Google Scholar 

  41. Shahraki AA, Ghahnavieh AE, Mirmahdavi SA (2014) A morphological approach to Persian handwritten text line segmentation. Int Conf Comput Model Simul, Cambridge, pp 298-301. https://doi.org/10.1109/UKSim.2014.93

  42. Shaikh NA, Mallah GA, Shaikh ZA (2009) Character segmentation of sindhi, an arabic style scripting language using height profile vector. Aust J Basic Appl Sci 3(4):4160–4169

    Google Scholar 

  43. Shakunthala BS, Pillai CS (2019) Unconstrained handwritten text line segmentation for Kannada language. Int J Innov Technol Explor Eng 8(12):953–956

  44. Shi CZ, Gao S, Liu MT, Qi CZ, Wang CH, Xiao BH (2015) Stroke detector and structure based models for character recognition: a comparative study. IEEE Trans Image Process 24(12):4952–4964

    Article  MathSciNet  MATH  Google Scholar 

  45. Siddhaling U, Prema KV, Subba Reddy NV (2013) Document image segmentation for Kannada script using zone based projection profiles, AIM/CCPE 2012. CCIS 296:137–142

    Google Scholar 

  46. Singh P, Verma A, Chaudhari NS (2015) Feature selection based classifier combination approach for handwritten Devanagari numeral recognition. Sadhana 40(6):1701–1714

    Article  MathSciNet  Google Scholar 

  47. Somashekar T (2021) A survey on handwritten character recognition using deep learning technique. J Univ Shanghai Sci Technol 23(6):1019–1024

    Article  Google Scholar 

  48. Surinta O, Karaaba MF, Schomaker LRB, Wiering MA (2015) Recognition of handwritten characters using local gradient feature descriptors. Eng Appl Artif Intell 45:405–414

    Article  Google Scholar 

  49. Tian S, Bhattacharya U, Lu S, Su B, Wang Q, Wei X, Lu Y, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134

    Article  Google Scholar 

  50. Vapnik VN (1998) Statistical learning theory, Wiley

Download references

Data availability statement

The datasets generated or analyzed during this study are not publicly available due to the author Ph.D (research) thesis submission but are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neerugatti Varipally Vishwanath.

Ethics declarations

Competing interests

The authors have no competing interest in any material discussed in this research.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vishwanath, N.V., Manjunathachari, K. & Prasad, K.S. Multi-lingual character segmentation and recognition based on adaptive projection profiles and composite feature vectors. Multimed Tools Appl 82, 24247–24268 (2023). https://doi.org/10.1007/s11042-023-14523-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14523-w

Keywords

Navigation