A Comparative Study of Feature Ranking Methods in Recognition of Handwritten Numerals

  • Abhinaba Roy
  • Nibaran Das
  • Amit Saha
  • Ram Sarkar
  • Subhadip Basu
  • Mahantapas Kundu
  • Mita Nasipuri
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 324)

Abstract

Feature selection is an important task during classification of any pattern. In this paper, we compute and compare the strengths of five most widely used feature ranking techniques in identifying the optimal subset of features for best classification results. The feature ranking measurements that are used here are information gain (IG), gain ratio (GR), correlation, symmetrical uncertainty (SU), and chi-square (CS). For evaluation purpose, recognition of handwritten numeral samples from five popular Indic scripts—Bangla, Hindi, English, Telugu, and Arabic—are used. These ranking methods are applied over quadtree-based longest run feature set. Experimental results are drawn and compared using support vector machine (SVM)-based classifier.

Keywords

Feature selection Handwritten numeral recognition Filters 

References

  1. 1.
    I. Tsamardinos, C.F. Aliferis, Towards principled feature selection: Relevancy, filters and wrappers, in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann Publishers, Key West, FL, USA (2003)Google Scholar
  2. 2.
    M.A. Hall, L.A. Smith, Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper, in Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (2003), p. 239Google Scholar
  3. 3.
    A. Roy, N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, Region selection in handwritten character recognition using artificial bee colony optimization, in EAIT (2012), pp. 189–192Google Scholar
  4. 4.
    P. Mitra, C. Murthy, S.K. Pal, Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 301–312 (2002)CrossRefGoogle Scholar
  5. 5.
    M.A. Hall, Correlation-based feature selection for machine learning (The University of Waikato, Hamilton, 1999)Google Scholar
  6. 6.
    H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, in Pattern Analysis and Machine Intelligence (2005), pp. 1226–1238Google Scholar
  7. 7.
    H.-M. Lee, C.-M. Chen, J.-M. Chen, Y.-L. Jou, An efficient fuzzy classifier with feature selection based on fuzzy entropy. Trans. Sys. Man Cyber. Part B. 31, 426–432 (2001)CrossRefGoogle Scholar
  8. 8.
    P. Luukka, Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst. Appl. 38, 4600–4607 (2011)CrossRefGoogle Scholar
  9. 9.
    A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, An axiomatic fuzzy set theory based feature selection methodology for handwritten numeral recognition, in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India, ed. by S.C. Satapathy, P.S. Avadhani, S.K. Udgata, S. Lakshminarayana, vol. 248 (2014), pp. 133–140Google Scholar
  10. 10.
    R.L. Plackett, Karl Pearson and the chi-squared test. Int. Stat. Rev./Rev. Int. de Stat. 11, 59–72 (1983)MathSciNetGoogle Scholar
  11. 11.
    J.R. Quinlan, Induction of decision trees. Mach. Learn. 1, 81–106 (1986)Google Scholar
  12. 12.
    M.A. Hall, G. Holmes, Benchmarking attribute selection techniques for discrete class data mining (Knowledge and Data Engineering 2003), pp. 1437–1447Google Scholar
  13. 13.
    S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn. 43, 3507–3521 (2010)CrossRefMATHGoogle Scholar
  14. 14.
    A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, M. Nasipuri, A new quad tree based feature set for recognition of handwritten bangla numerals. Eng. Edu. Innovative Pract. Future Trends (AICERA) 2012, 1–6 (2012)CrossRefGoogle Scholar
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
    M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18 (2009)CrossRefGoogle Scholar
  20. 20.
    V.N. Vapnik, An overview of statistical learning theory. Neural Networks 10, 988–999 (1999)CrossRefGoogle Scholar
  21. 21.
    C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)CrossRefGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Abhinaba Roy
    • 1
  • Nibaran Das
    • 1
  • Amit Saha
    • 1
  • Ram Sarkar
    • 1
  • Subhadip Basu
    • 1
  • Mahantapas Kundu
    • 1
  • Mita Nasipuri
    • 1
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations