Advertisement

A GA based hierarchical feature selection approach for handwritten word recognition

  • Samir MalakarEmail author
  • Manosij Ghosh
  • Showmik Bhowmik
  • Ram Sarkar
  • Mita Nasipuri
Original Article
  • 24 Downloads

Abstract

Feature selection plays a key role in reducing the dimensionality of a feature vector by discarding redundant and irrelevant ones. In this paper, a Genetic Algorithm-based hierarchical feature selection (HFS) model has been designed to optimize the local and global features extracted from each of the handwritten word images under consideration. In this context, two recently developed feature descriptors based on shape and texture of the word images have been taken into account. Experimentation is conducted on an in-house dataset of 12,000 handwritten word samples written in Bangla script. This database comprises names of 80 popular cities of West Bengal, a state of India. Proposed model not only reduces the feature dimension by nearly 28%, but also enhances the performance of the handwritten word recognition (HWR) technique by 1.28% over the recognition performance obtained with unreduced feature set. Moreover, the proposed HFS-based HWR system performs better in comparison with some recently developed methods on the present dataset.

Keywords

Hierarchical feature selection Genetic Algorithm Handwritten city name Bangla script Elliptical feature Gradient-based feature 

Notes

Acknowledgement

We would like to thank CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, for providing us the infrastructural support. This work is partially supported by the PURSE-II and UPE-II, Jadavpur University projects. Showmik Bhowmik is thankful to Ministry of Electronics and Information Technology (Me-itY), Govt. of India, for providing him PhD-Fellowship under Visvesvaraya PhD scheme. Ram Sarkar is partially funded by DST grant (EMR/2016/007213).

Compliance with ethical standards

Conflict of interest

All the authors declare that they have no conflict of interest.

References

  1. 1.
    Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922CrossRefGoogle Scholar
  2. 2.
    Chen XW (2003) An improved branch and bound algorithm for feature selection. Pattern Recogn Lett 24(12):1925–1933CrossRefGoogle Scholar
  3. 3.
    Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125CrossRefGoogle Scholar
  4. 4.
    Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK (2000) Dimensionality reduction using genetic algorithms. IEEE Trans Evol Comput 4(2):164–171CrossRefGoogle Scholar
  5. 5.
    Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437CrossRefGoogle Scholar
  6. 6.
    Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications. Springer, Berlin, p 207Google Scholar
  7. 7.
    Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166CrossRefGoogle Scholar
  8. 8.
    Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection–a comparative study. In: International conference on intelligent data engineering and automated learning, Springer, Heidelberg, pp 178–187Google Scholar
  9. 9.
    Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671CrossRefGoogle Scholar
  10. 10.
    Cateni S, Colla V, Vannucci M (2014) A hybrid feature selection method for classification purposes. In: European modelling symposium, IEEE Press, New York, pp 39–44Google Scholar
  11. 11.
    Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the 6th international symposium on micro machine and human science. IEEE, pp 39–43Google Scholar
  12. 12.
    Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168(30):1024–1036CrossRefGoogle Scholar
  13. 13.
    Meiri R, Zahavi J (2006) Using simulated annealing to optimize the feature selection problem in marketing applications. Eur J Oper Res 171:842–858CrossRefGoogle Scholar
  14. 14.
    Panda R, Naik MK, Panigrahi BK (2011) Face recognition using bacterial for aging strategy. Swarm Evol Comput 1:138–146CrossRefGoogle Scholar
  15. 15.
    Oreski S, Oreski G (2014) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064CrossRefGoogle Scholar
  16. 16.
    Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313CrossRefGoogle Scholar
  17. 17.
    Uysal AK, Gunal S (2014) Text classification using genetic algorithm oriented latent semantic features. Expert Syst Appl 41(13):5938–5947CrossRefGoogle Scholar
  18. 18.
    Leardi R (2000) Application of genetic algorithm-PLS for feature selection in spectral data sets. J Chemom 14(5–6):643–655CrossRefGoogle Scholar
  19. 19.
    Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R (2018) Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput.  https://doi.org/10.1007/s11517-018-1874-4 CrossRefGoogle Scholar
  20. 20.
    Tan F, Fu X, Zhang Y, Bourgeois AG (2008) A genetic algorithm-based method for feature subset selection. Soft Comput Fus Found Methodol Appl 12(2):111–120Google Scholar
  21. 21.
    Welikala RA, Fraz MM, Dehmeshki J, Hoppe A, Tah V, Mann S, Barman SA (2015) Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Comput Med Imaging Gr 43:64–77CrossRefGoogle Scholar
  22. 22.
    Katiyar G, Mehfuz S (2016) A hybrid recognition system for off-line handwritten characters. SpringerPlus 5(1):357CrossRefGoogle Scholar
  23. 23.
    Kim G, Kim S, Tek T, Kyungki S (2000) Feature selection using genetic algorithms for handwritten character recognition. In: Proceedings of the 7th international workshop on frontiers in handwriting recognition. International Unipen Foundation, pp 103–112Google Scholar
  24. 24.
    Shi D, Shu W, Liu H (1998) Feature selection for handwritten Chinese character recognition based on genetic algorithms. In: IEEE International conference on systems, man, and cybernetics. 5:4201–4206Google Scholar
  25. 25.
    Oliveira LS, Sabourin R, Bortolozzi F, Suen CY (2002) Feature selection using multi-objective genetic algorithms for handwritten digit recognition. In: Proceedings of 16th international conference on pattern recognition. 1:568–571Google Scholar
  26. 26.
    Oliveira LS, Sabourin R, Bortolozzi F, Suen CY (2003) A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. Int J Pattern Recognit Artif Intell 17(06):903–929CrossRefGoogle Scholar
  27. 27.
    Morita M, Sabourin R, Bortolozzi F, SuenCY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of 7th international conference on document analysis and recognition. IEEE, pp 666–670Google Scholar
  28. 28.
    List of languages by number of native speakers, https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers. Accessed on 11 July 2017
  29. 29.
    Singh PK, Sarkar R, Nasipuri M (2015) Offline script identification from multilingual indic-script documents: a state-of-the-art. Comput Sci Rev 15:1–28MathSciNetCrossRefGoogle Scholar
  30. 30.
    Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2009) A hierarchical approach to recognition of handwritten Bangla characters. Pattern Recogn 42(7):1467–1484CrossRefGoogle Scholar
  31. 31.
    Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recogn 60:1057–1075CrossRefGoogle Scholar
  32. 32.
    Madhvanath S, Govindaraju V (2001) The role of holistic paradigms in handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 23(2):149–164CrossRefGoogle Scholar
  33. 33.
    Bhowmik S, Malakar S, Sarkar R, Nasipuri M (2014) Handwritten Bangla word recognition using elliptical features. In: International conference on computational intelligence and communication networks. IEEE, pp 257–261Google Scholar
  34. 34.
    Bhowmik S, Roushan MG, Sarkar R, Nasipuri M, Polley S, Malakar S (2014) Handwritten Bangla word recognition using HOG descriptor. In: 4th International conference of emerging applications of information technology. IEEE, pp 193–197Google Scholar
  35. 35.
    Barua S, Malakar S, Bhowmik S, Sarkar R, Nasipuri M (2017) Bangla handwritten city name recognition using gradient-based feature. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, Singapore, pp 343–352Google Scholar
  36. 36.
    Malakar S, Sharma P, Singh PK, Das M, Sarkar R, Nasipuri M (2017) A holistic approach for handwritten hindi word recognition. Int J Comput Vis Image Process (IJCVIP) 7(1):59–78CrossRefGoogle Scholar
  37. 37.
    Sahoo S, Nandi SK, Barua S, Pallavi, Bhowmik S, Malakar S, Sarkar R (2018) Handwritten Bangla word recognition using negative refraction based shape transformation. J Intell Fuzzy Syst 35(2):1765–1777CrossRefGoogle Scholar
  38. 38.
    Malakar S, Ghosh P, Sarkar R, Das N, Basu S, Nasipuri M (2011) An improved offline handwritten character segmentation algorithm for Bangla script. In: Proceedings of the 5th Indian international conference on artificial intelligence, pp 71–90Google Scholar
  39. 39.
    Vajda S, Roy K, Pal U, Chaudhuri BB, Belaid A (2009) Automation of Indian postal documents written in Bangla and English. Int J Pattern Recognit Artif Intell 23(08):1599–1632CrossRefGoogle Scholar
  40. 40.
    Dzuba G, Filatov A, Gershuny D, Kil I, Nikitin V (1997) Check amount recognition based on the cross validation of courtesy and legal amount fields. Int J Pattern Recognit Artif Intell 11(04):639–655CrossRefGoogle Scholar
  41. 41.
    Kim KK, Kim JH, Chung YK, Suen CY (2001) Legal amount recognition based on the segmentation hypotheses for bank check processing. In: Proceedings of 6th international conference on document analysis and recognition. IEEE, pp 964–967Google Scholar
  42. 42.
    Malakar S, Ghosh M, Sarkar R, Nasipuri M (2018) Development of a two-stage segmentation-based word searching method for handwritten document images. J Intell Syst. Preprint  https://doi.org/10.1515/jisys-2017-0384
  43. 43.
    Phatak AM, Pande SS (2012) Optimum part orientation in rapid prototyping using genetic algorithm. J Manuf Syst 31(4):395–402CrossRefGoogle Scholar
  44. 44.
    Spears WM, Jong D, Kenneth D (1995) On the virtues of parameterized uniform crossover. Naval Research Lab, Washinton DCCrossRefGoogle Scholar
  45. 45.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, 1:886–893Google Scholar
  46. 46.
    Bhowmik S, Sarkar R, Das B, Doermann D (2019) GiB: a Game theory Inspired Binarization technique for degraded document images. IEEE Trans Image Process 28(3):1443–1455CrossRefGoogle Scholar
  47. 47.
    Gonzalez RC, Woods RE (2009) Digital image processing. Pearson Education, IndiaGoogle Scholar
  48. 48.
    Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on computational cybernetics and simulation systems, man, and cybernetics. IEEE, 5:4104–4108Google Scholar
  49. 49.
    Dasgupta J, Bhattacharya K, Chanda B (2016) A holistic approach for Off-line handwritten cursive word recognition using directional feature based on Arnold transform. Pattern Recogn Lett 79:73–79CrossRefGoogle Scholar
  50. 50.
    Marti UV, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recogn 5(1):39–46CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceAsutosh CollegeKolkataIndia
  2. 2.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations